THE ANNALS 
of 

MATHEMATICAL 

STATISTICS 

(founded by h. c, carver) 

The Official Journal of the Institute of 
Mathematical Statistics 


VOLUME XIII 


1942 



THE ANNALS 

OF MATHEMATICAL STATISTICS 


EDITED RY 

S. S. WILKS, Editor 

A. T. CRAIG J. NEYMAN 


H. C. Carver 
H. Cramer 
W. E. Demino 
G. Darmois 


WITH THE COOPERATION OP 

R. A, Fisher 
T. C, Fry 
H. Hotelling 


R. VON MlRFfi 
K. 8. pKAHstm 
It L, Rum 
W. A, Sbewhart 


The Annals of Mathematical Statistics j's publ.sbcd quarterly by the 
Institute of Mathematical Statistics, Mt. Royal & Guilford Avon., Baltimore, 
Md. Subscriptions, renewals, orders for back numbers and other huaineast rom* 
munications should be seat to the Annals of Mathematical Btatiktkt, Mt 
Royal & Guilford Aves., Baltimore, Md., or to the Secretary of the Inst£ 
tute of Mathematical Statistics, E. G. Olds, Carnegie Institute of Technology, 
Pittsburgh, Pa. Changes in mailing address which are to become effective for 
a given issue should be reported to the Secretary on or before the 15th of ite 
month preceding the month of that issue. The months of mm are March, 
June, September and December. 

Manuscripts for publication in the Annals of Mathematical Statistics 
should be sent to S. S. Wilks, Fine Hall, Princeton, New Jereey. Manuscript# 
should be typewritten double-spaced with wide margins, and the original 
should be submitted. Footnotes should be reduced to a minimum and wbenevw 
possible replaced by a bibliography at the end of the paper; formulae in foot* 
notes should be avoided. Figures, charts, and diagrams should l»e drawn on 
plain white paper or tracing cloth in black India ink twice the »lae they are to 
be printed. Authors are requested to keep in mind typographical difficulties 
of complicated mathematical formulae. 

Authors will ordinarily receive only galley proofs. Fifty reprints without 
covers will be furnished free. Additional reprints and covens furnished at ewi 

The subscription price for the Annals is $6.00 per yew*. Single copies $1.69, 
Back numbers are available at $6,00 per volume, or $1.60 per single ianue. 


Composed and Printed at the 
WAVERLY PRESS, Inc. 
Baltimore, Md,, U, 8. A, 



CONTENTS OF VOLUME Xin 

Articles 

Anderson, Paul IL Distributions in Stratified Sampling.. 

Anderson, It. L. Distribution of the Serial Correlation Coefficient 1 

Battin, I. L. On the Problem of Multiple Matching. '*•* 

Bernstein, S. (Translated by Emma Lehmer), Solution of a Mathe¬ 
matical Problem Connected with the Theory of Heredity . .... ® 

Burr, Irving W. Commulative Frequency Functions. 2 b 1 * 

Camp, Burton H, Some Recent Advances in Mathematical Statistics, I 

Chung, Kai-Lai. On Mutually Favorable Events. ^ 

Craig, C. C. Some Recent Advances in Mathematical Statistics, It. ” J 

Dwyer, P. S. Grouping Methods. .. 

Geiringer, Hilda. Observations on Analysis of Variance Theory. ftSG 

Hart, B. I., and von Neumann, John. Tabulation of the Probability 
for the Ratio of the Mean Square Successive Difference to the Variance Jtff 
Hasel, A. A. Estimation of Volume in Timber Stands by Strip Sampling l 
Kimball, Bradford F. Limited Type of Primary Probability Distri¬ 
bution Applied to Annual Flood Flows. . a IS 

Koopmans, Tjalling, Serial Correlation and Quadratic Forms in Normal 

Variables. II 

Lonbeth, A, T. Systems of Linear Equations with Coefficients Subject 

to Error.. . 332 

Lotka, Alfred J. The Progeny of an Entire Population. UA 

Mann, H. B., and Wald, A. On the Choice of the Number of Intervals 

in the Application of the Chi Square Test. W 

Samuelbon, P. A. A Method of Determining Explicitly the Coefficient* 

of the Characteristic Equation. 5'ii 

Satterthwaite, Franklin E. A Generalized Analysis of Variance . 31 

Satterthwaite, Franklin E. Linear Restrictions on Chi-Square , ftfe} 

Satterthwaite, Franklin E. Generalized Poisson Distribution ... 4ib 

ScHEFFfe, Henry. On the Theory of Testing Composite Hypotheses with 

One Constraint... 2 ».i 

ScHEFFjfc, Henry. On the Ratio of the Variances of Two Normal Popu¬ 
lations.. , $ 7 j 

Stephan, Frederick F. An Iterative Method of Adjusting Sample Fre¬ 
quency Tables when Expected Marginal Tables are Known . jig 

vonMibes, R, On the Correct Use of Bayes’Formula.. 

von Neumann, John and Hart, B. I, Tabulation of the Probabilities 
for the Ratio of the Mean Square Successive Difference to the Vari¬ 
ance... 9Q? 

Hi 















iv 


CONTENTS OF VOLUME XIH 


Wald, Abraham. Asymptotically Shortest, Confidence Intervals 127 

Wald, A. A., and Mann, II. B. On the Choice of the Ntnnls-r of Inter¬ 
vals in the Application of the Chi Square Test . . fiOO 

Wald, Abraham. Setting of Tolerance Limits when the Sample in 

Large. 389 

Wilks, S. S. Statistical Prediction with Special Reference to the Problem 

of Tolerance Limits... 400 

Wolfowitz, J. Additive Partition Functions and a Clints of Statistical 
Hypotheses. - . ■ . . . 247 


Notes 

Beckenbach, E. F, Convexity Properties of Generalized Mean Value 

Functions. & 

Birnbaum, Z. W. An Inequality for Mills' Ratio.. , 24$ 

Curtiss, J. H. A Note on the Theory of Moment Generating Functions 430 
Dieulefait, Carlos E, Note on a Method of Sampling, 04 

Fischer, Carl H. A Sequence of Discrete Variables Exhibiting Correia- 

tion Due to Common Elements. 07 

Geirtnoer, Hilda. A Note on the Probability of Arbitrary Events 338 

Giiishick, M. A, Noto on the Distribution of Roots of a Polynomial with 

Random Complex Coefficients.. , 235 

Girshick, M. A. A Correction. 447 

Hart, B. I. Significance Levels for the Ratio of the Mean Square Hucew- 

sive Difference to the Variance.*,. 44 S 

Ltjkacs, Eugene. A Characterization of the Normal Distribution .. 81 
Mann, Henry B. The Construction of Orthogonal Latin ftquaiw .... 418 
Paulson, Edward. An Approximate Normalization of the Analysis of 

Variance Distribution. 233 

Paulson, Edward. A Note on the Estimation of Some Mean Values for 

a Bivariate Distribution.... . , 444 ) 

von Neumann, John. A Further Remark Concerning the Distribution of 
the Ratio of the Mean Square Successive Difference to the Varj- 

ance . SB 

Wald, Abraham, On the Power Function of the Analysis of Variation 

Test . 434 

Miscellaneous 

Abstracts of Papers. j j I 

Annual Report of the Secretary -Treasurer of the Institute., ....... *107 

Report of the Dallas Meeting of the Institute.... Iffg 

Report of the New York Meeting of the Institute... r _ _ 402 

Report of the Poughkeepsie Meeting of the Institute.. .. 448 



















THE ANNALS 
of 

MATHEMATICAL 

STATISTICS 

(ZN)UttD8S> »V H. C. CARVER) 

The Official Journal of thk Institute 
of Mathematical Statistics % 

Contents 

Distribution of the Serial Correlation Coefficient. R. h. Arirmoniox. 

Serial Correlation and Quadratic Forms in Normal Yttrkhiw. 
Tmumi Koopmaks. ..... . 

A Generalized Analysis of Variance. Franklin K. Sat* 

TK 11 THWAITK.. .....' 

Distributions in Stratified Sampling. Paul II. Arm *warn .. 

Solution of a Mathematical Problem Connected with the Theory 
of Heredity, S. Bkhnrtklv (Translated by Rwma LrhmkhL. 

Some Recent Advances in Mathematical Statfeties, L litttmtw 
S°mh Recent Advances in Mathematical Statistic.*. It. C, C. 
Notes; 

A Further Remark CkmewrniRR the Utafmtmtinn of the Rada *>( the 

fiififowfltift to m 4cr,itw voftf 

Convexity Properties of OwtwraHswtl Mmr Value Funeticuta K. F. 
Bno&BNSACK..... 

tSTia^*S“ciaaTBi«ss;r p*r ;: 

A i ar. <•• 

Report of the New York MteUng of the Institute . 

Report of the Dalian Meeting of .the Institute. .' 

Annual Report of the Stautdaty-Timwurer of the Institute.. 
Abstracts of Papers...... ...* 


£?, 

*»W 

1 


14 

1*4 

ta 

m 

m 

14 


81 

94 

m 

lost 

w 

107 

110 


M XIXI, No. 1 — Mareh, 104® 






THE ANNAI.S 

OK MATHEMATICAL STATISTICS 


IMTBI {A 

a. H. \VII,KK, iCihltir 


A. T. CRAIG 


■l NKYMAN 


a C, Cahvkh 
H» CttAMKH 
W, R Demimj 
G. Dakmois 


WITH Trtr: rnftpMtJiTlOY m 

it. A. 

T. C. Fur 

H. Hmi ruxu 


R vm * 

I'. I*t> MMHY 
M L. Ilif.tz 
VV, A. Sui.WIIAKT 


T „T , .f; AN " A ‘f " MATHfe5 ' tAT i t AI - SrATttnrw if inihlMW intarifrlv bv ihn 
iMtiiate of MaibomaUcal HlrtWIft., Alt. Itoy.il «v C fu.lford Aw U-dtiLS? 
Mi{. Subscription#, renewals, orders for Iwk ntiwU-r* , ln t ' , , ! 

Should be sent to U, A,Mu*«r 

Royali&GuiIlonI Ave*., Raltimore, Md.. <,r to tie oj t i w . j.* 

mln/o i h u U ’ rf ’ { ! <jrt, '‘ ** thrr or 1,01010 »| (r jfiHt 

m Vr ntX> it{ tk,t fe *«* n* month- of w :rn \ r ■ 

Jitne, Septerobor and Deeember. n Uf • 

Manuseriptf) for publication in the Ajsva im w \I c 

should be sent tq 8. 8. Wilks, Fine Hall I'rineHm, V, \ « 

plain white paper or tracinir doth i,V I i i“ r T*! * brn,W >* «**»«» wt 
wfll 'be I^SdS *'‘T 00 ^. F ‘ f,y * s f’ rin, « without 

r ! r . Addltl0Mal 


.te^mbSaS 1 "j* yj* §UM * 

^ v. '....■ 1 *mtmo, «r fl.Stt per Mogfe. matt*, 




0 e ^f? 55 ,> ASD Air Tim 

WAVkOLY PRESS, Ilte . 
bAKStMORB, Mu.;, IT. 8. A. ' 






y ■“ V ,Vd 

V*'» *'/ 


^ ^ Artor ^ 




DISTRIBUTION OF THE SERIAL CORRELATION COEFFICIENT 

By R. L. Anderson 
North Carolina State College 

1. Introduction. The problem of serial correlation was brought, to tin* atten¬ 
tion of statisticians by Yule in 1921 [ft] Both Yule and Bartlett [2] have shown 
that the ordinary tests of significance are invalidated if successive observations 
are not independent of one another. The serial correlation coefficient lias been 
introduced as a measure of the relationship 1 hi tween successive values of a 
variable ordered in time or space. Interest in the serial correlation problem was 
stimulated further by the new concepts of time series analysis discussed hy 
Wold [8]. 

We shall define the serial correlation coefficient for lug L and N observations 
to lx 1 

_ t .C N _ -Yi X,, H + XiXlh + * • ♦ + X N Xt. - (SX) s /A 7 
,v Vs zx; - (SX.)VA 

where C and V are the covariance and variance respectively anti the X't* are 
considf'red to Ik* imli*i>ondently normally distrilmted altnut the same mean with 
unit variance. 1 If the population variance were know n a priori, the variates 
could Ik*, transformed so that they would have unit variance; under such an 
unusual circumstance, the only distribution required would la* that of the serial 
covariance. Tintner has given a test of .significance for the serial covariance [0] 
and for the correlation coefficient [7] by using a method of selected items. The 
author has presented the distribution of the serial covariance and of the serial 
correlation coefficient not corrected for the mean in a recent doctoral thesis [If. 
The distributions of ,,R N not corrected for the mean will in* mentioned in the 
sections which follow. 

2. Small sample distributions for lag 1. W. (J. Cochran has suggested that 
we use a result given in his article on quadratic forms to derive tlu* distributions 
of the serial correlation coefficient for small samples [3). If Jf j, Xt , * * ■ , Xu 
are independently normally distributed with variance 1 and mean 0, then 

t 

“Every quadratic form Za„X i X r is distributed like 22 Xwu , where r is the 

rank of the matrix, A, of the quadratic form, the ids are independently 

distributed as x, each with 1 d.f,, and the X’s are the non-mo latent routs 

of the characteristic equation of /l" [3, p, 17ft]. 

If each X, appears kt times as a latent root, u, null lx* distributed as x with k t 
degrees of freedom. 

1 This circular definition of the aerial correlation coefficient was suggested by H. 
Hotelling. 


I 



2 


R. L. ANDERSON 


If we set L = 1 in the above definition of the serial covariance, we note that 
the characteristic equation of i On is 


1 Fff 


Ol 

at 

Uj 

a.v 

O.v 

Ol, 

Ut 

• • • Oj?-1 ’ 

• 

■ 

• 

... , ' IBS 0, 

Oi 

Oj 

Ot 

Ol 

223 

0 * =» 


— 2)/2A T , and all other a\s » — 1/JV, 


n r ti i 

i Ff,(\) = IX a,wl 7 , where w* is the Hh rcrat of unity. Hence, 
JC-l U-l J 

it- j ■» 


iFn = B ( x * + b ) + w 2 (w * + w * l) ~ n § wr j■ 


Since 


AT-1 


X) «r' 


— (w* + I + M* 1 ) 

(N — 3), 

= IT {— Xit 4- («* + o>* l )/2| ** IX + cos 


y~i 


jc«i 

2irfc 


Hence X* = cos -^r-, (A; = 1, 2, • • • , A'' — 1), and 


i C„ = 


tw-D 
y ^ Xi: Uk f 

Km I 

UW 

X - «, 

A«.l 


for k N 
for k N 

a 0 . 


for N odd, 
for N even, 


where u k is distributed as x 2 with 2 d.f. and u with 1 d.f. At the same time, 
we note that Vn = S(X, — i ?) 2 is distributed as x with N — I d.f. 

The general procedure in deriving the distribution of Jin is as follows; We 
determine the joint density function of the u’s which form the distributions of 
iCV(= iiiV 7 W ) and V N . The u’s are integrated out, leaving the joint density 
function of i R N and FV. The distribution of x Rn is obtained by integrating 
with respect to V N from 0 to «>. As examples, derivations of the distributions 
of Jit, and i Ri have been included. In order to simplify the results, (lie first 
subscripts have been dropped from x Rn . 

Distribution of R t . R e V e = \ lUl 7 \ }Uj — u an d Vt ~ iq + itj + it, where 
u, and u, are distributed as x 2 with 2 d,f. and u with 1 d.f. and X, « * and 
= Hence the density function of the u’s is 

.D(wi, u 2 , u) = (4'v/27r) -1 u -i e~ ! '\ 





SERIAL correlation coefficient 


a 


.Since u\ = [F#(ft# — Xj) + u(l + Xj)]/(X t — Xj) and 

11$ «= (F«(Xi — ft#) — u(l + Xi)]/(Xi — Xj), 


u must vary between 0 and F#(Xi — ft#)/(1 + Xj) for X, < ft# < Xi and between 
F«(X a - ft#)/(l + X s ) and U»(X, - ft#)/( 1 + X,) for -1 < ft* £ X,, After 
integrating with respect to u between these limits and then with m<f»ert to \\ 
from 0 to oe, we obtained the following density function for ft# ; 


DVU) 


V(x, - ft.) 
V(i + Xi) (Xt - x*) 

.V(x L -ft,}. 

I V(l + Xi) (Xi - X,) 


+ 


V(x 2 - ft,) 
y/ (1 + Xj) (Xi — Xi) 


for X 3 < ft# < X, 
for ~*1 < ft# < x,. 


The cumulative probability function has the same general form: 

' (X, - ft')* + (X, - ft')' 

V(1 + Xi) (Xj — Xj) a/(1 + Xj) (Xj — X]) 

(Xi - ft')’ 

V(1 + Xi) (X, - Xj) 

Distribution of ft?, ft,V, - XjUj + X,Uj + Xju» and V, *» mi «f u, H- »», 
where each u is distributed oh x with 2 d.f. Hence, 


ft(ft# > ft') 


for -1 < ft' < Xj 
for Xj < ft' < Xi 


« 1 , 


W(ft, — Xj) + U a (X, — Xj) , 
- (X.--W .- “ d “> 


VAX, - ft,) - in(X, - X.) 
(X, - Xj) 


For X, < ft, < X,, 0 < u, < F,(X, - ft,)/(X, - X»); for X, < ft, < X,, 
V,(Xj — ft,)/(Xi — Xj) < w* < V,(Xi ~ ft,)/(X, — X,), Using these limit*, we 
derived the following density function for ft,: 


D(fti) 


(Xi ~~ ft ,) , ^Xi — ft,) 

(Xi — Xs) (Xi — X,) (X, — Xt) (X, — X,) 

(Xi — Rj) ^ 

JXi — Xj) (Xi — X,) 


for X, < ft, < X s 
for X, £ ft, < X u 


The cumulative probability function is similar, except that the coefficient 2 
cancels and the exponent of each numerator is raised by one. 

General formulas for N odd. It appears that the density function for ft*) and 
Vf/ for N odd is 


D(ft», V„) - g (X, ~ R»)"*-»/a t for X U A ^R,<X n f 

l(y-x) 

where a, = II' (X. - X,) for j 3 * i and l/K « 2 ,,w ""r(H*V - 3)), This 

/-I 


* Note that we are omitting the l$g subscript from \Rn, 



4 


R. I„ ANDERSON* 


formula holds for N = 5 and 7; wo will show that it holds fur A r + 2, assuming 
iUrue W. If we set k = *(1V 4 D. + fc* **>d V,* - 

V N + u k ; hence, 

n = (%A Fa \ 11 ~ X *“^ and V* * IV« - u k . 

" Y V»H - U* 

If we make the substitution Uk = wt^w+j t the densitv function fur «*, V m» • 
and fiw+ 2 is 

^yitfr 11 £ KXi - W> - - x*)) ltw *y«,. 



In order to obtain the distribution of V N +i and R n +i , we must integrate out ul. 
The limits of integration differ for different values of m. We note that 


Uk — {R/f ~ Ru-i-i)/{R n —■ x^), 

except that u k s 0 when X* < -fl* < X m +i, since X m +i < /i« + i < X« and ul 
can not be negative, For R y+2 > X*, u[ < 1; hence, if ft* is replaced by a 
larger (smaller) quantity, ul will be larger (smaller). 

For m = 1(X 2 < Rn+i < Xj), we need to consider only that region for which 
X2 < Rn <\i. In this region, 0 < ut < (Xi — Rjv + 2 )/(Xi — Xt) and the density 
function of ii w and y w+2 is 



SERIAL CORRELATION COEFFICIENT 


<^>(^a'+s)(Xi — Rri+i)^ N 31 A*i) 

where <t>(V K+2 ) = F^f 1 ’c“ 1 ^ + V2 !(WH) -TlK^ - 1)] and*[ = II (X, - X,). 

I-S 

For m = 2(Xa < Rk +2 < X 2 ), we must consider two regions in the Ry plane. 
When Xs < Rjt < Xi, 

X 2 — Rn -ts ^ s Xi Rn\2 
X 2 — Xjt “ * — Xi — X* ’ 

and when X 3 < Rt, < X 2 ,0 < u' k < (X 2 — Rs^)/(\i — Xt). If we combine the 
density functions for these two regions, we find that 

2 

D{R\ + 2 , Fw+ 2 ) — ^(Fyj 2 ) 21 (Xi 22y+j)^ V w v forXg ^ R.v-t 2 ^ X*. 

1-1 

Similar results can be obtained for the other regions. 

Finally we conclude that for N odd, 


D'M = i(N- 3) E (X. - iie,v) itA '" 6 7«. 

1 «m! 


for Xm 1 1 ^ j/i.v ^ Xk 


p(iB# > R') - E (x. - 

1-1 


for Xmu 5 fU ^ X«, , 


where a, = IT (X. - X,),iVj. The, general density function for A T odd and 
if?y not corrected for the sample mean is [ 1 ] 

Ujv— i) 

DM =h(N - 2) E (i«y - X.) ilw “%i for X„ < ,/f.v < X M . 1( 


1 \ IT — » / 

where a, = II' (X, - X,)\/(1 - X,), i f* j. 

7“1 

General formulas for N even. Using the same method as above, we can show 

that the same formulas hold for N even and Jl N corrected for the mean except 
t(F-2) __ * 

that in this case a, = H (X, — X,)V(X, -f 1), j ^ i. No general formulas 
were derived for N even and Jt N not corrected for the mean. 

3. Large sample distributions for lag 1. The simultaneous density function 
of C and F, where we will drop the subscripts for convenience, is 

D(C, F) = (2t)’ 2 f f 0 («, t)e” ,c ~ tv da dt, 

J— 00 00 

<*>(«, t) — K f • • • f e" 1 ' dXx dXi • * • dXft, 

J—60 J— CO 



6 


R. t. ANBERftOS* 


where 6 = {2X\ - 2l{Z(X, - J?)’l - JUIX.-V, + • r A\.Y ; . LY. f S'r 

and s and t are pure imaginaries. 

4>(s, l ) = A“*, where A is the determinant i>f the quadrati*- i-irt.i a Ihf 
determinant was evaluated by the method nf rirruliwh. «<- f«’*n i that A - 

Y-l 

XX (1 — 2(£ -f- s\i)j, where X* cos 2 rk/S. t 


Set K a* 

H~ 1 


log <#»(«,/) 5=5 2 *,, 



If K is expand'd tn wtn-*, wr the! that *, -• 


m! 2 m X}X*, where fft = (t + j — 1 ). For .V > i, w might mdc at** di'-M* 
*-1 

summations: 2X* «■ —1, 2X| « §(,Y — 2), 2X* »* -1, 2Xj «■ | d,Y St fs! ,d 
2Xr = —1, Hence a.- w = E{C) * — 1, **i »■* £< V*) ** t.Y - 1 ,, » r -• 

(Y — 2), K« = <7* = 2(N — 1), A'U » po^tfr ** —2, **• ■» «■> R, *,4 - Hi.v } 

Kin = 4 (N — 2 ), kij * - 8 , etc. 

If we let C‘ — C + 1 and V » V ~ (,Y 11 , all of tbrw ifrvjj-ianto 

will remain unchanged except that * w « « (). Finer ft . i" V. 


1 \ _ C'(N ~ 1) + V 

Y - 1 / [F + (A r - l)](Y — I) 


C'(.Y - 1) + V* 
'{N - l ) 1 




If we neglect terms of order leas than 1/Y, £(ft) » -1 /ft„V ~ 1j, £■ ft . $»* ** 

(ft' _ 2 ) ’ ’ 

(ft 1 _ an< ^ ~ *" 0 for fe > 2. For .V < 75, a ranrr rwt apjtmxf* 

mation may be desired. 

If the above approximation is used, } R> is normally distrilmtod «nj, m **m 
-1/{N - 1) and variance (N - 2 )/{N - \f. The wnglr-lml .ucntfrimv 
points can be found by substituting in the formulas 


ijRjv(.05) = Zi± 1-645 V(N - 2 ) 

N - 1 


or ift*(.Qi) «* 


- i i 2 820 \ 
.V 3 


Refer to Fig 2 for a comparison or the exact distribution and the normal ttf> 
- “• ; ■“« U» Ktuph, or <l„. dtauihul „L 

, ' . . ,„ in _■ We might note a few comparisons to’tnwn rise 

approximate significance points and the exact ones; 


N 

Positive tail | 

NagaUvn tail 

5% 

1% 

6% " ! 

Exact - Appro* 

—0.356 -0.369 
—0.276' -0.262 


Exact 

Approx, 

Exaot | 

Approx, 

Exaot | Approx. 

45 

75 

0.218 

0.173 

0.223 

0.176 

m 

0.324 

0.255 

—0,262! -0.2M 
—0.199J -0.203; 




SERIAL CORRELATION COEFFICIENT 


7 


For i R# not corrected for the mean, it was found that y = \J ffftS 

asymptotically normally distributed with mean 0 and variance 1 [1], 

4. Significance points of Ji# . An example of the methods used in tabulating 
these significance points has been presented in the author’s doctoral thesis [1]. 
The* significance points for the values of N enclosed in parentheses have been 
ohtninwl by graphical interpolation. Note that N is the number of observations 
Table IS- 



5 , 3>istxibu lions for general lag, L. (a) Introduction. For a general lag, L, 
the const ante in the characteristic equation for the covariance l Ch are a, - 
- (X +■ 1/-V), at+i - o^-tn - (N — 2)/2 N and all other a’a = -l/Ah Hence 
the characteristic equation is 

« |l [X* - cos (2 tt Lk/N)) * 0. 

Certain important generalizations concerning lFn may be set down. ^ 
j. When L is not a factor of N or has no common factor • 

2 When L and N have a common factor, a, L F N = {iFhm (>• - 1) ■ 

2a. If * - L, t.F’tt - (iF p ) L (X - 1)*“\ where p = N/L. 

The proof of the first statement was suggested by Cochran, Since 
cos (« + 2 av) « cos a, where a is any integer we must prove that the series ol 

numbers 




8 


it. i,. amjrhkon 


L, 2 L, , i-V - l)L, 

when reduced modulus N can he arranged hi form the series 

1, 2, • • • , (A’ - 1). 

This proof can be found in most books on the theory of number-; e g. 'V H* »<■• 
we conclude that each term of the sequence {rris (2wfJ: „YtJ reduce- tmiqm h 

TABLE 1 


N 

Positive tail 

6% j I*:; 

Negative 

5 r L 

• ml 

V , 

5 

0.253 

0.297 1 

-0.753 

- 0.79H 

G 

0.345 

0.147 j 

0.70K 

fl.Hu.3 

7 

0.370 

0.510 1 

0.074 

0.799 

8 

0 371 

0.531 

0.025 

0.70 i 

9 

0.3GG 

0.533 } 

0.5113 

0.737 

10 

0.3G0 

0.525 

0.501 

0.705 

11 

0.353 

0.515 ! 

0.539 

0 079 

12 

0.348 

0.505 

0.510 

ft. <*55 

13 

0.341 

0.495 j 

0.497 

(1.031 

14 

0.335 

0.485 

0.479 

O 015 

15 

0.328 

0.475 

0.402 

0.597 

20 

0.299 

0.432 

0.399 

0.521 

25 

0,276 

0.398 

0.350 

0.-173 

30 

0,257 

0.370 

0.325 

0. 133 

(35) 

0.242 

0.347 

0.300 

0.101 

(40) 

0.229 

0.329 

0.279 

0.370 

45 

0.218 

0.314 

0.202 

0.350 

(50) 

0.208 

0.301 

0.248 

0.339 

(55) 

0,199 

0.289 

0.230 

0.321 

(60) 

0.191 

0.278 

0.225 

0.310 

(65) 

0.184 

0.2G8 

0.21(5 ' 

0.298 

(70) 

0.178 

0.259 

0.207 

0.2K7 

75 

0 173 

0.250 

-0.199 

-0.270 


1), when L X 


to one of the sequence {cos (2 tt&/ 2V)) for it = 1, 2, ■ • • , (JV 
is a prime fraction. 

If L and N have a common factor, a, L ~ qa arid N — pa, where p ami o 
are integers prime to one another. Hence, 

lFn = S { X * - 000 = H (k* - cos f)° (X - cos 2 xf 1 

= W(\ - l)- 1 = 0. 

If « = L, l F„ = ( 1 F,)*(x - where p = jV/I. 



SERIAL CORRELATION COEFFICIENT 


9 


When the ,hp results are applied to the large sample distribution of ,.1t v , we 
find that it is independent of L. For the more important ease in which p - .V.7., 
tlu' semi-invariants for C and V are exactly the same for all L with a given .V. 
Wo sec that 


K t = -\L 1«K {1 - 2(1 + «X fc )| - \{L - 1) log |1 

kr#l 


2(1 -)• «) I, 


where \ L = cos (2 irk/p). lienee, = m\2 m ^ + 1^ — 1. Hut 

ji i 

S Yi + 1 is always 0 or a multiple of p when p > i) therefore, the p’- cancel 
a-i 

and k, 3 is the same for all p or for all L, since L N Jp. When p < i, the 
k,/s will not he equal for all p. For example, = 20V — 1) for p — 2 and 
KM ~ 2(A r — 4) for p — 3. 

(h) Distributions of L [i N when N/L = p. These insults indicate that the 
distributions of the, serial correlation coefficients for which the number of ob¬ 
servations is divisible by the lag, so that N/L — p, would include tin* distribu¬ 
tions of all the serial correlation coefficients regardless of the values of .Y and I,. 
We will designate any lag L as the primary lug for a given X if N ‘L — p, an 
integer. For example, s lit, and Jin have the same density function, hut \\e will 
derive only the density function for lag 2, which we will call the primary lag. 
The ease of j) ~ 1 is trivial, since it involves correlating a -cries with it -elf. 
To date, wo have derived the exact density functions for p ~ 2 and p - 3 
and the required integrals for p ~ 4. The significance points have been tabu¬ 
lated in Table II. For .simplicity of notation, we will set Jtv - /.A'„ and 
F* = V. 

Casa p = 2 (N — 2 L). JtzV = — iq + u% and V - iq -)- w< , where », is 
distributed as x" with L d.f. and tq as x 2 with L — 1 d.f. lienee, 


D,.(ui, uf) = £(«,)“'• «(«0' u ", 

where 1 JK — 2'~ ! r(iL)r('(X — 1 )]<■’ a . After substituting iq - F(1 — /AV) 2 
and iq = F(1 + ill/)/2 and integrating with respect to V from h to «, we have 


IX,Ik) 


(1 - j.R#)* a 5 '(1 + JO) 1 "' J 
■ 2'^mu \(L~ 1)1 


If we set (1 — ,!{/) — 2//, then the eumulative probability function is 


p(jti > m - 


i 


Mil «’( 


\ D) 


r u '* !i U - y)*"' 1 dy. 


Pearson has tabulated the values of these incomplete Beta functions |f»], hi 
his notation, P — I X \\L, %(L — L)J, where x ~ |(1 — It'). For JO not cor¬ 
rected for the mean, P — I X (\L, \L) (1). 

Case p = 3(iV = 3L). ,Ji%V — — itq -)- u and F = i q + u, when* iq is 
distributed as x 2 with 2 L d.f. and u with L - 1 d,f. Therefore, Dju ,, «) « 



10 


H. L. ANDERSON 


Ku^u^, where 1/K = 2 ,<Si ' JS r{L)r[£(A - 1)K' ! . After instituting 
ui ~ 27(1 — Jt ,)/3 and u = 7(1 4- 2 t A a )/3 and integrating with n-xjjwt to 
7 from 0 to », we find that 


D(Jti) = 


2 A (1 - je a )^ l (l + 2 ( .A^) ,U ~ ,, 
3‘^JJli. W' -1)1. 


lR, > -i 


If we set X - 2(1 - B')/3, P( L Rt > R‘) - MA, i(L ~ 1)1- For Jf, not cor¬ 
rected for the mean, P = I,[L, \L\. 

Case p = 4(iV = 4L). t,RiV — — us 4- in and 7 « a* 4* a* + a, when* u* 
is distributed as x ! with L d,f., m with L — 1 d.f. and a with 2L d.f, The density 
function of the u’s is Dl(ui , u<, u) = Ah4 a ' sl u' a a i4‘ V v *, when* 1 A' & 
2 i(a - I, r(ib)r[i(L - l)lr(L). Since u 4 - [7(1 + ,.R t ) - «! 2 and u 3 « 
[7(1 - jA) - u]/2,0 < a < 7(1 - Jii) for t A« > 0 and 0 < a < 7«1 4 .*A,i 
for uBr < 0. For tA< > 0, 


Wp~^ Y /* ’'t 1 —t *4) 

W«) - toi J [7(1 + *8.) - u]‘“-«l7(l - A) 

Z * u — 0 


w) Ut «*■“' f/u. 


For Jii < 0 , D(iMt) is the same except that the upper limit fur the integral in 
7(1 + JO)■ If we make the substitution y ® u/(np|x>.r limit) in each raw* and 
then integrate with respect to 7 from 0 to »,we have these density function*: 


Z>(iA) = k' 


(1 4 JU) 






’(1 -y)‘ a “*’t(l - t Ri) ~i/(i + t A t )} ,a ’ tip, 

for >,IU < 0 , 

(l - ^->/V(i - y) ia ” J, [(i + t«i) -yd - JR*)1 ,U "dy, 

for t Ai > 0 , 

where k = rft(4L - l)]/ 2 i<4t_3) • V{L) ■ T($L) • T[$(L - 1 )). 

The probability integrals must be evaluated for each L, The cumulative 
probability functions for L = 2 and 3 are: 

PURi > Rl) = i _ V2 . p + - A' m (5 4- R 0 /V 2 , for A' > 0 , 

2 ((1 + R') t,s , for A' < 0, 

p(jti > r') = ( (1 ~ R,),li ’ tw R 1 1 o, 

4 [(l-A , r-(-K72) 6 , 1 (22A' s + 36A , + l26), forA'< 0 , 

fnr S ff'V th n d r sit r * UnCtioM are much sim Pler for A' > 0 when L is odd and 
L > 3 /ITT’ w ? havo derivcd only the8e significance points for 

siciifiLncennintT 0 ^ f 16 lntermediate P° bta ' It was noted that the 

22 ZTITtZ 0 ™ ^ in T able 1 for the first k «' For 

comparisons see Table III below, Note that for L > 7 the 5% Doints an> 
“ m0 ‘ i ,dmticd “ d ‘ b » '% Points are nearly acoumte'to two d SrTi 



SERIAL CORRELATION COEFFICIENT 


11 


TABLE II 


Significance points of Jit? for p => f and 5 1 



po»2 (N ™2L) 

p-3 W-8 L) 

L‘ 

Positive tail 

Negative tail 

Positive tail 

Negative tail 


5% 

\% 

5% 

1% 

5% 


5% 

n 

2 

0.805 

0.960 



0.488 

0.762 

-0.496 

-0.50 

3 

0.729 

0.907 

0.928 

0.994 

0.447 

0.677 

0.474 

0.490 

4 

0.oo4 

0.852 

0.848 



0.610 

0.439 

0.480 

5 

0.612 

0.802 

0.773 

0.902 

0.373 

0.559 

0.4(H) 

0.401 

6 

0.571 

0.759 

0.712 

0.856 

0.346 

0.518 

0.377 

0.440 

7 

0.536 

0.721 

0.662 

0.812 

0.324 

0.485 

0.354 

0.420 

8 

0.507 

0.688 

0.620 

0.774 



0.334 

0.402 

9 

0.483 

0.659 

0.585 

0.739 


0.433 

0.316 

0.387 

10 

0.462 

0.634 

0,554 


0.278 

0.413 

0.301 

0.373 

12 

0.428 

0.590 

0.505 

0.656 


0.380 

0.276 

0.347 

14 

0.399 

0.554 

0.467 


0.239 

0.353 

0.250 

0.326 

10 

0.370 

0.523 

0.436 

0.577 

0.225 

0.332 

0.240 

0.308 

18 

0.357 

0.498 

0.410 

0.546 


0.314 

0.227 

0.293 

20 

0.340 

0.476 

0.389 

0.520 


0.298 

0.215 

0.280 

25 

0.308 

0.432 

0.347 


0.182 

0.268 

0.193 

0.254 

30 

•0.282 

0.398 

0.317 

0.431 

0.167 

0.245 

0.176 

0.234 

40 

0.247 

0.348 

0.273 

0.374 

EBBa 

0.212 

0.153 

0.205 

50 

0.222 

0.314 

-0.243 

-0.335 

Ha 


-0.136 

-0 184 


1 


TABLE IIP 


Significance points for p — J? 




Positive tail 


Negative tail 

L 

N 

5% 

l% 





Exact 

Table 1 

Exact 

Table 1 

Exact 

Table 1 

Exart 1 Table l 

2 

8 

0.373 


0.618 

0,531 


-0.625 

-0.818 j-0.7«4 

3 

12 


0.348 

0.547 

0.505 

0.528 

0.516 

0.092 1 0.055 

4 

'mum 

0.325* 

0.322 

0.490* 

0.466 

0.451 

0.447 

0.004 i 0,580 

5 

20 

0,301 


0.451 

0.432 

0.402* 

0.409 

0.5-13* 0.524 

0 

24 

0.281* 

vER 

0,419* 

0.404 


0.363 

0.497 } 0.482 

7 

28 

0.264 

0.264 

0.392 

0,380 

-0.338* 

-0,337 

-O.40O*|-O.448 


* L is the lag and p «* N/L. 

* * indicates interpolated values. 




























12 


R, h. ANI)KROON' 


Case p > 4. We have not set up any of the den-tty ftmobnii- f«*r p > 4; 
however, it appears that the significance points given tor lag 1 would 1«* ac¬ 
curate enough for the higher lags. The exact significance punt - for lag 2 have 
been derived for j> — 5 and 7. The reader may note the rite*- apju»*,\in«.riit>ii 
given by the significance points for lag 1 when ;» - 7. WV hup* to cheek the 
lag 1 approximation for other lags in the near future 

TABLE IV 


Some sitjnifirmin' /hhii/k fur lag ■) 




Punitive tail 

Ni'jtsifm 

’ail 



5% 

) 1% 

y J- 

r: 



p = 

5 (A r « 10) 



Exact. 


0.342 

i 0.510 

-n.iiT 

U .V.tv'i 

Approx. 

.. , 

0.3(i0 

i 0.525 

•-O.AiiJ 

u.TttA 



?! » 

7 (A' - 14) 



Exact. 


0.335 

| 0.482 

-0.470 

» o.nbt 

Approx. 


0.335 

! 0.485 

— 0.170 

n old 

7. Summary. 

1. The exact and ! 

largo sample distribution*, haw 1 

»ci*ii derived 


for the serial correlation coefficient for lag 1 and the exact Mgnifinmre pnnU 
tabultaed for N, the number of observations, up to 75; for .V > 75, the large 
sample approximations can be used. 

2. It has been noted that the distributions for any lag I, an* tin* *unn* a,*, thw 
for lag 1 when L and N are prime to each other. In general the distribution of 
the serial correlation coefficient can be derived for any L and ,V by using only 
those distributions for which L is a factor of N. The distributions and signifi¬ 
cance points have been derived for N/L = p « 2,3 and 4. For p > -h.Y > •}/.), 
the significance points given for lag 1 probably can be, used when 1. is greater 
than 4 or 5. The accuracy of this approximation lias been checked for lag 2. 

3. These significance points should be useful in determining the* method* of 
studying a time series, as suggested by Wold, and in the formulation of a better 
test of the significance of regression coefficients when we know that the observa¬ 
tions are correlated in time. In addition, we now have a method of testing mtr 
assumptions of independence for any set of data. 


REFERENCES 

111 R. L. Anderson, Serial Correlation in the Analysis of Tima Series, uiipiihluM thesm, 

■o 5brwy ’ Iowfl ' State Gollc S°- Ames, Iowa, 1041. 

T'V* \ hC tirae / oorrelaUon in regard to U*U of 

significance, Roy, Slat, Soc, Jour,, Vol. 98 (1936), pp, 53&-513, 




HI,RIAL 'CORRELATION COEFFICIENT 


13 


(81 W. G C othran, “Dwtribuiinn c.f (jundratir forms in a normal eyatein with aitphratio 
tn (hi* analysis of rnvananrr," Camb Fhtl, Scr !'tne , Vnl. 30 510-311, pj, j^ R 
[4\ L K. Dickson, Morkm Slrnsenlnrp Thrdry of .VumOrs, T” of Chiragu I’trwt, 

(, r i) Karl Pearson dvlitor), 7'ahhti ?»/ ihr InampUte Hrln Funtlum, f'amltniiRt* r 

urn. m ' 

[fij G, Tiktni R, "Trfif#nf sigtufiraiw in tmir arum," Annul* nfSlnth, Stat , V»«l Id ht™n 
li HI ft. '• 

(71 (», Tinimh, Thr Vonafp Ihpfrrnrt Method, Pniiripiw Prm, BlfRimitifttoii, Imiia n . 

AjumtiiIh JIB, 1*M<). ‘ 

(Hj II. Woi.it, ,-1 Si ml}/ tn Ihr ,1 nalyttn of Sloiwnarp Ttrnr Smut, AltnijuinJ »*i<S Wihwlk 
ItuktryrJuTi A. II , rit{«»a3a, lSAT.i. 

(0) G. I*. Yrnt, "On thr linv-rnrrrJainm prohtrrn," R<>y Adit Site, Jour. Y««l M 
pji.mS37 



SERIAL CORRELATION AND QUADRATIC FORMS IN NORMAL 

VARIABLES® 


By Tjam.ino Koqpmani* 

Perm Mutual IJfr Inmmnrc Cumpany 

1, Estimation problems of stochastical processes, in trgrrvmm of 

economic time series a situation often arises in uhit'll a certain ufwrvrd quan- 
tity represents a "dependent” variable at one time and an "mdcjaTtrimP" van* 
able at a later time. For instance, the following rriiithui- may exist in»w«< n 
the price x t and the supply y t of hogs at any time /: 

. x, = a - liy, + z, 

-■v , t , " 

y, = y + 6 X 1 . i + z,. 

The first of these equations expresses the price-depressing influence of large 
supplies. The second equation expresses the supply-stimulating influence of 
high prices one time unit (in the rase of hogs, alamt 1H mimthsi earlier. The 
terms z\ and z, represent influences of additional variables and nr random dis¬ 
turbances, Elimination of y t leads to 

(2) x, = t - fx,_i + z,. 

The statistical estimation of the parameters < and f of such an equation t* 
usually attempted by the ordinary least squares method, disregarding the fact 
that the observation x ( is both a dependent variable at time t and an inde¬ 
pendent variable at time l + 1. The following rim pie example show* that thm 
may lead to erroneous results particularly in small sample-. Supfww that # (}, 

f = -1, and that z, is a purely random variable with mean 0, while only three 
successive observations are available. The least squares estimate of i)*i* then 
given by the slope of the straight line connecting the jKiinls < jr, , and ijr } , x*t 
in the plane of x,_i and Xi. This slope, however, has an expected value (), 
because according to our assumptions the- conditional expectation of for it 
prescribed value of x 2 is equal to , whatever value that is. Thus the Iwt 
squares estimate of f = -1 has an expected value 0 showing an imjwirtant huts. 

Mathematical business cycle theories utilize systems of equation* much nuur 
complicated than the example considered [1]. The common feature «f tbe*e 

equation systems is, however, that they reduce fluctuations in a act of mmomir 
variables to 

1. earlier fluctuations in the same set of variables, 

2. changes in given non-economic or external variables, and 

3. random disturbances. 


> This investigation was carried out at the Local and State Government Section ifrinr,. 
ton Su^eys) o the School for Public and International Affairs of PrineetoJ Vn vrnUy 

ESXiS'Sr -1 “ tta 0k " w n " li “ 8 lht 


14 



SERIAL CORRELATION 


15 


An equation system of this type has been' said to define a stochastical process 
in a number of variables [2]. The statistical testing of mathematical business 
cycle theories accordingly requires a theory of estimation of the parameters of 
stochastical processes. The operation of stochastical processes is also apparent 
in meteorological data, Assuming a normal distribution for the random dis¬ 
turbances, it will be seen that the mathematical prerequisite for an estimation 
theory of stochastical processes is the study of joint distributions of certain quad¬ 
ratic forms in normal variables. 

In this article only the very simplest problem of this class will be treated, 
namely that of testing the significance of £ in equation (2) if it is known that 
( £ | < 1 and that t is equal to zero. This is the problem of testing the signifi¬ 
cance of single serial regression, or of single serial correlation, because the dis¬ 
tinction between single regression and correlation coefficients disappears in this 
simple case for coefficients absolutely smaller than unity. 

In the next section the problem of estimating single serial correlation if the 
mean is known will be stated and the diflicufties involved will lie discussed. In 
section 3 a conditional distribution of a quadratic form in normal variables will 
be derived, The proof in section 3 covers only forms in five or more variables, 
but another proof covering any number of variables is given in section 4. This 
distribution is then applied to devise a test of significance of serial correlation 
in section 5. The reading of section 4 is not necessary for the understanding of 
section 5." Readers desiring to locate only the main results can read those from 
equations (3), (II), (16), (21), (36), (61), (02), (74), (79), (82), (92), and (90). 

2. The estimation of serial correlation. In the stochastical process 

(3) Xt = pXj_i + z<, 

where the z< are independent drawings from a normal distribution with mean 0 
and standard deviation a, the parameter p may have any positive or negative 
values. The process will only be a stationary one if 

(4) I P I < 1. 

For, since 

(6) Ext «* A’xi-i *=* Ezt ■» 0, Ez\ *» v 1 , 

and 

(6) Ex] = p*Ex]~.\ + <r*i 

a variance of Xi independent of t will be possible only if (4) is satisfied, in which 
case 



16 


TIAM.1N<? XOftPM <»V< 


If (4) is not satisfied, however. Ex] will m jjvv--, m& <»n >4 * ‘♦■Mduru* fo 

infinity in approximately geometric progression 3! * .i • 'V, t m." If. ihr- 

article the limitation s-lt will l»* impend a pirori 
It follows from f3). i7u ami tin* assumption T<gvd-T,g t, . **'» 

dintrilmlton of the quantities s ,. i- er»«j. 


( 8 ) 


(l - p‘V I , ,J 

V 2ff<r y 



f 2 

3 - 5 


r? 

•f'*. 


Since the Jarobiiui of the transformation 3) fnon rt,< ■ .ojaf-v , «• 
to the variables xi, j’j • ■ * x f equal* rinitv, th< joint *h'ti.bu**-«n f-;m i»,n <j >1,** 
T successive observations .ri , .r> ■ r 7 »ha* nub- up > -■moph r f : *vi 1 iiopiv 
by replacing the s t in *8) by the nun spoitriiNg »*,;>!< mi :u 'to , I ®.i- 
leads to the distribution 


( 9 ) 


(1 pl |U : l-r.* t i r 3 


in which the three quadratic form-' 

j 2 . 2 

I - .iq r .r* , 


( 10 ) 


m -*•- .r,.r ; 4- /y, 4 • 


n — xi 4- A 4 • • • f x' T ,. 


arc the only characteristics of tin 1 sanqile that eriOr In »ith«r u<»,-d, , 
and n arc jointly sufficient statistics for the estimation of t , .,n«S ■» 1* u, r. }- 

noted that these statistics remain tin* same if the serin *»f nb « n »tj«m - r- u 
in inverse order. 

It seems natural to attempt mu\iruum likelihood t 'limnium »4 t < met ■ i < it 
if the usual optimal proportio-s of estimates so obtained bo- so jo Jed (on 
proved for stoehasticnl processes. Straight forward calorlaiiom h ad **, rtm 
following third-degree equation for the maximum likelihood estimate /, of ,, 

(U) (m - pn)( 1 - p-) - l [l - 2pm + (1 + p ; m] . tf 

Of course the root asymptotically approaching m n has to be selerie«j Th* 
corresponding maximum likelihood estimate £ of a is gju-n la 

(12 > # - - 2pm + (1 + ?)n). 

In view of the complicated definition of p it seems desirable as a s,,.« step 
to derive from (9) the joint probability distribution of I, tn and » Tins remtitw 
a transformation of the volume element dx\ • ■ • dx r in (It) to the form 

m, n) dl dm dn. 



KBKI.IL CORRELATION 


17 


which it assumes after integration over T — 3 other cormhnnfrs tile variation 
of which floe*- not change I, m and n. 

Since this is purely a pmblein of integration completely drtmed by the ex- 
ine^ions t id), tin* resulting function 4>U, m, n) is indejienderit of p and a. The 
joint distribution 


(l U 


ft - aV 

(2wiTd ,r 


J|l 2pn * f 1 * ti* *t,) u 1 


#(/, m, n) ill ihn tin 


of /, m and n mil thus lie known fm any values of p and a as soon as it is known 
for two poiticular values 

If as particular values we choose p -- 0 and n I, the x, become identical 
with the z, , and the problem is that of finding the joint distribution of the 
quadratic fonris ill), ni inde|tendent normal variables with mean (I and vtui- 
anee 1. Keen if so sjmjditiod, the problem is a complicated one, While them 
are nilinhelv manv common sets of principal axis of the forms / and it. none of 
these sets of axis has a single axis ut common with m. 

Although no ■"Intern i-< nlTcied for tin- problem, the following suggestion may 
be ventnied < tie i* y< l, m n ) is known, the mathemalteiiily simplest pmmhnc 
for iuteival estimation of t . might well lie one that routines attention to samples 
having the -.nine \nines of / and n as the sample artnallv obtained Snilabh 
chosen j«-nentiles of fhe i "iiditional dish jbnlion of m with / and n livd at the 
olvservcd values, would l*e convertible into eoufidimte limit" foi p with the 
help of -11, 

A simpler niathi itntieal ptohlem is encountered in testing wle lhei the exist¬ 
ence of a difTeienee I*.-!wei n p and (I can bi* established, oi, m ttil*• i woiih, in 
testing die sigmlieumeof •<‘iiul nnielation. If p ■ 0, the distubutiou innetion 
in tdt depend’ oulv on /< / < a, not mi l or n separaMv, and ex.e ' -ignite 

ranee limits for m I an be detived from the joint distiilmtion 

*' ■ *’ p, HI ) tip l/m 


(151 


gSTiT ! 


of ji and m only This di »nbiifi<m will Im- .studied m the m x' tlu.e -ectiotis 
It ho|M*d that fh>* tu> tlmd* them applied will provide a useful tailing point 
in the treatment of ode r pmblem- of die ela-ss deseril«*d in Mihon 1 


3. Distribution of a quadratic form in normal variables on the unit sphere- 
(’unstder two (ptadtatie fome. m T mdejmudent normal variable wtdi mean 0 
and vtuiance 1, 


(Hi) 


P 

H 


t i . 

X\ I Xt j 


i 4, 




XjXj 


b xr/r 


While the characteristic values nf die form p an* all coincident with tin- value 1, 
the characterisin' values *, of./ are provisionally *»pjKisrtl to b- dtHfen-ni front 
each other, hi that they can lie arranged in decreasing order 



18 


TJAI/LINO XOOPMAS8 


Kl > Kt > ■ ' * > *r ■ 


(2*) H V 4p 


(17) 

The probability density 

(18) 

in the space of the variables is constant on any sphere 

(19) p ~ pa *» 'constant, 

while the distribution function g(p) of p is that of the ^’-distribution with 7* 
degrees of freedom 

(20) g(p) 


p tr ~V p 

Ft itfy 


The hyper-surfaces on which the ratio 

( 21 ) 


r * - 


of q to p is constant are cones with the origin as vertex directing the same 
proportion of the metric "surface" of each sphere (19). It follows that the 
conditional distribution function of r for a prescrilntl value pa of p is inde¬ 
pendent of that value p 0 , and is therefore equal to the unrestricted distribution 
function h(r) of r. In other words, p and r are independently distributed. Their 
joint distribution being 

(22) g(p)Hr) dp dr, 

the joint distribution of p and q » rp is found to be 


(23) 


/(p, q ) dp dq 




The function h{ ) may therefore also be described as the conditional distribution 
function of q on the unit sphere 


(24) 


p ■» 1. 


Since ki and k t are the extreme values of q under the condition (24), the function 
h(r) vanishes outside these limits. 

We shall now derive an expression for h(r) by comparing (23) with an ex¬ 
pression for f{p, q) obtained through the inversion theorem of characteristic 
functions, The characteristic function F( v , 9) corresponding to the variables 
p and q is 


(25) 


Fin, 9) = (2 *r‘ r / 




dxi • ■ • dx r <® 0 ), 


where, according to (16), the polynomial D(ti, 9) is given by 



SERIAL CORRELATION 


19 


It follows from the inversion theorem that 

00 

(27) fip, q) = (2ir)“ 5 / / 8) d V dB, 


the order of integration over i ; and 8 being immaterial. 
Any elementary factor of D(ij, 6) may be written 

(28) d t {n, 8) = 1 — 2 in — 2idK, — (1 — 2 in) (l — 


2tfl* , \ 
1 — 2tV 





Figure 1, Paths of integration in the *-plane 


First considering the integration over 6 (while n has some fixed value), we may 
instead of 8 use 

1 - 2i v 


(29) 


2 id 


as an integration variable. The path of integration e„ in the x-plane then is a 

1 

straight line from 0 to-- 


oo and another straight line from * ? l - <® 

2t 2i 


back to 0, as indicated in Figure 1, and the transformed integral (27) runs 


fip 


,q) = (2*)“’ r\e-">'(l - 2inT' T+l f 

J—qq L e, 




§ 0 -?)}"(-»)*']* 




20 


TJAM.tXfi KOUI’M VNS 


The integrand 

»» n (i-;) (- 2 v) 

for the integration ovei n has singularities onlv in tin* jtnitif, « ** mid >. , 

l = 1, 2 7'. In older to simplify the argument we di.tll shi* tie 

quadratic form q is positive definite, or, in emim-Hion with S > • -b p * - • ti 
The location of the singularities is then as pirtinnl m Fumm 1 V" ’ 
the integrand (31) is regular and of the m'«le» >>f m.igm'uhc »■} «, t <m <- 
quently a curve integral of (31) along (he whole or any jus! *<f <Ie *. H 

will tend to 0 if It tends to inlinily. I’miir » ihenicin of t an* ir«, ?’ > tin t* 
permissible in (30) to replace the dc'crilted path <v 1 *y uuothe, pa’h >** /, w!,:*h 

1 — 2a n 

.starts out along r„ from 0 up to — /f.fioni theie follows die mti !■ *, It 


to the right over an angle t up to tin* point 


1 


~ !n h\ mui from lln-»« ,>• mm 


to 0 along c, -provided that /f > m. After lever-ing tin* dm-riimi ns nhnb 
the path is followed in order to do away with the negative miku in 31He- 
path so obtained can again be replaced by the path *>,.* shown in 5, 

which coincides with c, only up to a small db-lnure d from the n.d ,im>, .ind 
encircles all singularities k, while retaining a d'slame it fioiu tie- pint of »1 u ,d 
axis to the left of and up to m . Finally, a path of integration y" nid«*p« n<i» nt 
of the value of q is obtained by going to the limit in which <1 tf 'I he i< an 
integration twice along the part or the real axis la-tweeu U and •>! . inu-giatmg 
from 0 to ki that branch of the integrand which is obtained by pa—ing ' nieha" 
each singularity, and going back froniM to 0 with the brunch obtain'-.1 by pa—mg 
"around” n t and “over” each other singularity*. 'Hie integral wMtblmned con¬ 
verges at each singularity. This is also true for the singularity » it l«*niii*e 
we are dealing only with positive values of r/, which makes tin* <<\|«ui»*ntiul 
factor in (31) tend to 0 if k approaches zero. We shall now show that il' in 
(30) the path y' is substituted for r, (with a change in sign), the order of integra¬ 
tion over k and q can bo reversed if T St 5. 

The integral over 17 , taken from (30), 


(32) 


I = (2x)~ l [” e-'^-^d _ 2b,) |r<1 d n , 

J—co 


(in which k is now a positive real number), is by the substitution V /» </ \ 

transformed to the integral encountered in the derivation of tin* «lbt libation 
(with T - 2 degrees of freedom) by the inversion theorem of rhainetmdie 
functions. It may be quoted without proof (see (3) p. -12) that it equnb 


(33) 


i = .«/«)„i 


lr-i 


2^-niW- i) 


if p — q>\ A 0, or a > r, 

______ > f P ” ?/* £ w K $ r - 

“For even values of T the parte of y' for which * < k t can be disregarded, hemme tut these 
parts the same branch of the integrand is integrated in opposite directions. 


U = 0, 



SERIAL CORRELATION 


21 


It is necessary to observe, however, that the integral 7 converges uniformly for 
all real values of k whenever T S 5, because then 

(34) f°| 1 - 2i„ r^'ck, 


is convergent. Because of this property, the reversal of the order of integra¬ 
tion is allowed for T 2s 5. 

If now in (30) we first carry out the integration over tj and use. (33), we are 
left with 


05) At, ,) - L (” ~ !)' {B 0 “ ?)} ' 2* ' 


where y r now is any curve proceeding from k = r into the lower half-plane, 
crossing the real axis at a point k > m , and returning to k — r through the upper 
half-plane, as indicated in Figure 2, (The path directly obtained is a path y' r 
consisting of twice the ical axis between r and m , the branches of 1 lie integrand 
being taken as indicated by y r ) Comparing (35) with (23) and (20), using (21) 



Figure 2. The integration path y, 


and the well-known formula I'(x) = (x - L)I’(x - i), we find the following 
expression for the distribution function of r: 


(36) 


Hr) = 


\T r J. 

2Tri 


II (k — K ,) 1 


(Ik. 


This function vanishes for r 2: k, . In order to arrive at a positive distribution 
function for k t < ? < m that branch of the integrand must be selected which 
is positive for real values of k exceeding M . 

It is worth noting that the degree in k of the numerator of the integrand is 
two leas than that of the denominator. Owing to this fact, indeed, the. distribu¬ 
tion function h(r) satisfies the two obvious conditions: 


(37) A(r) =0 for r £ , iTt f h(r) dr = 1. 

Kf 

For r ^ k t the path of integration in (36) can be replaced by any closed contour 
enclosing all the singularities r, k t , • * • , «i (r is a singularity only if T is odd). 
Taking as such a contour the circle | k | = It with R tending to infinity, we find 
that h(r) = 0 because the integrand is of an order k 2 at k = «. Further, if 
yr is again replaced by y', which runs entirely along the real axis, 



22 


•mu IN<; KrH»pwvXf 



because the integrand in tin* last integral te nl the order of * 1 n> fie' imrit 

K — 00 • 

The quantities r anti k, enter into the right hand inemWr of m»3v ;n *1** 
form of differences from the integration variable Tie* addition of a (on** ant 
e to both r and the n t will therefore merely result in a change of **f the 

distribution on the r-axis without a change in form: 


(39) 


h*{r -*•' t) sl hlr). 


This could be expected since such a t mu-format ion mean- the addition of */* 
to the quadratic form q studied. It follows (hat tin* vnlnltlv «<f 3<o r- n«*t 
limited to positive definite t|uudratie form- r/, since any other quadratic form 
can be transformed to a [xisitive definite form by die. iqs-raJem if a -<iitb>-i<-nih- 
large value of «is taken, 

The function /i(r) is a different analytic function lie tween uny two different 
successive characteristic values k, and «, i. The expiwdun 'Uih hold* ho men 
and for odd values of T, and is also valid for any nurnlwr of coincideneof. in 
the set of characteristic values «,. It is true that integration along the jcith*. 
y’ or 7 r entirely coincident with the real axis, such a* has I sen ir»tmd»i«*d in 
intermediate, stages of the above proof, cannot i*e done if two *»i more <«f the 
tc t coincide, because of divergence of the integral. (hire t.’tti) ha- In-on tHtehli h< <I 
for distinct characteristic values, however, it follows from rmn*idcra!i»»n- of con¬ 
tinuity that this result holds good also if coincidences occur in the h*J 
The function h(r) has been studied by von Neumann |4{ hy an entindv dif¬ 
ferent and very ingcncous method for the special case that T i* even white no 
two characteristic values are equal, and for the ease that the rhiirartemtie 
values are equal two by two but otherwise different. The pnija rf tes (ttahhdiwl 
by von Neumann, and some generalizations of these pra|K*rtie\ can is* derived 
from (36). If T is even, the derivative of A(r) of tinier \T -« 1 ** 


(39) 



h(r){ 


{ Yf - 1 )1,(-l) 1 ^' 11 

i~i 


if xi+i < r < and ! mid. 


does not exist for r ** t » 1, 2 • > * T, 
, = 0 for all other values of r. 



SERIAL CORRELATION 


23 


If all characteristic values are distinct, all derivatives of an order lower than 
hT — 1 exist and are continuous everywhere. Generally, whether T be even 

or odd, at a point where k characteristic values coincide, (jr'j Mr) will exist 

and will be continuous if j ^ — k) — f, and will not exist if j S 

K T - k) - 1. 

If the characteristic values are pairwise equal, 


(40) 


X2*—1 K2a — X, , S I, 2, ' * " 8 t 


but otherwise distinct, their total number T = 28 must be even, and the only 
singularities of the integrand in (36) are poles at the points k — X ,. Accord¬ 
ingly the path of integration y r can be considered as a closed curve, and the 
integral in (36) can be replaced by the sum of the residuals of the integrand at 
all poles inside the curve: 

(41) h(r) = (S - 1) £ if X. r+1 < r < X. r . 

Hero P'(\) is the derivative of 

(42) P(X) = n (X - X,), 

n* 


its value in the point X = X, being 



n 


U"*l 


(u\«) 


(X. - 


x„). 


For 8 = 2 this is simply the rectangular distribution 


(44) h(r) = -~±—, Xj < r < Xi. 

— A2 

The numerical calculation of the distribution (36) with distinct characteristic 
values is extremely cumbersome except for very small values of T . If the 
characteristic values follow some definite pattern, however, it may be possible 
in some instances to work out a reasonable approximation formula. Two ex¬ 
amples of this type will be shown in section 5, 


4. Another proof that covers also cases with T < 5. The proof of (30) 
given above holds only for T £ 5. Once the form of (30) is known or presumed, 
however, another proof of its validity is available, which has mathematical 
interest in itself, and covers all cases from T ~ 2 upwards. This is a proof by 
complete induction, based on the proposition that, if (30) holds for T variables, 
then it also holds for T -f 1 variables, This proposition again rests on the 
recurrent relation 



24 


TJAMJtNG KOOPMA.NP 


(45) 


Wrt - < r - r ’> >,r - *'"> ""‘-M*. 


if *ru r' 


and 


* *r. 


proved elsewhere in this issue hy von Neumann* |fi]. It r« intact- He- dt«tnl,u» 
tion function k T (r) for T variables with the function ftsr.t'r’f obl.'ine-d l»v the 
addition of one variable Jrei and one characteristic value 
We shall substitute the "pn'sumed" expression t3tt) for h r t\ with T ■; A m 
(45) in order to show that the result for farn(r) is the twin** r\prv*,<si«ui with T 
increased hy one. In this proof if has for simplicity's *«ak*‘ !>■ *-n avaiim-d that 
the new characteristic value k t +i is smaller limn any of lhow already present, 
and that no two of the x t are equal. It is then inedible again to w*|i <0 m ;pi< 
the path of integration y' r which proceeds along the real axi-* from r to * s and 
returns along the real axis to r, passing each singularity in the same wav as >, 
does. If the integral (3G) is substituted in I45j in this form, the older of integra¬ 
tion over k and r can be reversed, the result Wing 


(46) 


'L[{S < ‘”“ ,) } , /.. (r - r ' )l< ' 


<. tmb 
*rn<l 


ri‘ r *,/- 


dt. 


Writing for greater clarity k- t +i *■ o, r' *> fa, t « c, r t, we hn\o u, rvshuite 
the integral 


(47) 


/ T (o, b, c) « f (z — a)" ,rv '(* — b) *(r - s) u 

Jb 


n < h < r, T s? 3. 

with the positive square roots taken if z is real and b < e < r Supjww first 
that T = 25 + 1 or odd. Then the integrand 

( 48 ) <hs+i{z) = (z - a)“ 3 (? ~ b) \c - zf' * 


has singularities at a, fa and c, of which only those at b and c are of a type Mich 

t lat <hs+i( z ) changes its sign if the argument z is turned once around the singular¬ 
ity. It follows that 


(49) 


2/w+i ** j <^m+i(«) dz , 


the path Of integration S being as indicated in Figure 3, For if the curve & j« 
contracted so as to run entirely along the real axis, from fa to r and bark M fa, 

stenriT™ ?? ° l ? UrVe , Wl1 each yield a contribu, ' i()I ' equal U, I UtU , the under- 
rZfJ 6 a? ^ 8q r re roote are taken **»» swing from fa to r. 

of S Sm£ s " “ • ■ “ ind of ‘'""t «W**»rtK»a 


’ 4 am 8 reatl y indebted to Profeasor von Neumann for 
me before its publication. 


communicating thin relation to 



SERIAL CORRELATION 


25 


(50) 


— 2/2S+1 



dz, 


where e, as in Figure 3, encloses the only singularity not enclosed by 6. 
neighborhood of z = a the following expansion of fas i i(z) holds: 


(51) 


$w+i(z) 



(* ~ Q^ r/aV 
s’! lAW 


(z - W*(c - 



In a 


The only term contributing to (50) is that with — S 4- ,s = —1. Since we 
selected a branch of fa a +i{z) such that (z — b)~\c — z) s-1 falls on the positive 
pure imaginary axis for real values of z below b, this term can be written 

where positive square roots should now be taken. The contribution of this 
term in (50) is 2 iri times the coefficient of (z — a) -1 , and therefore 


- (s-hr.dr» 



(53) 


- <- d" 1 £ ( ,s s ! ) (-d ! ( 5 - or*-^ - »r‘ M 


'r(*)r(l) ^ SH(c a) ,(c '' 

Since I'(4) — tt 1 , it follows that for odd values of T 


(54) I T - Jig!) (5 - a)~ ir+I (c - o)-‘(c - b)^\ 

It is easily seen that the same relation holds good if T = 2iS' is even. In that 
case it follows from (47) that 

( 55 ) (sP“ -1 cs - 2) 1 /.’ * 

= OS - 2)I (c - a)- s+ ‘(c - b)-\ 


In a manner similar to the transformations in (53) it can likewise be proved 
that the right hand member in (54) has the same derivative of order S — 1 



26 


TJAU.lN'0 K(K)I*M WK 


with respect to c. It follows that the two mvinlwr# of ; hi- a - 

nomial Q(c) in c of ft degree nt most equal to .S’ 2, the oti-llif of which 

may depend on a and b. However, Itoth mentis*!*- of <f»4 > *>,?, »»]} a.- tie >r S,r-.t 
3 — 2 derivatives with respect to r vanish if r b Th«r?fnr* Q vamdun 
identically, and (54) holds for any integral values of T mu *.rtssris«T i)t;m 2 
Finally, if (54) is inserted in (46) an expression for A r * 9 r*! e. olUotte-l which 
corresponds to (30) with T replaced by T 4 l. 

It remains to prove (30) for some initial value of T. For T 2 :h<- integral 
in (36) is divergent, but the form of h{r) is easily found ditwlly Writing 


(56) 

we find that 

(67) 


p = x? + x*, 


r ae 


MJr* Hh j»j xj 

A x\ 


d(xt, Xi) 
d(p, r ) 


r i(P'Av 

L9(*|. *j)J 


2r 


‘i 


2x, 


2xi. 

P 


r) 


4xiXi(ki — Kt) 


4(«> 


2r u 

p 

1 

rj*(r 


r) 




The probability density in the Xi-xj-plane is, of court*', i2#t V lr , but iu 
making the transformation (57) a factor 4 must la* applied to account for the 
fact that to given values of p and r corrwpnrid 4 seta of value# of j, «*«! s, t 
differing in the signs only. This leads to the joint distribution of p arid r 

(58) 


JL er \p _dp dr 

2r (xi — r)"»(r -*,)»’ 


and, after integration over p, to 


(59) 


h(r) 


if < r < *j 


it(ki - r)T(r - *,)»’ 

*=0, if r < xj or ki < r, 
in accordance with (39). 

Finally, if (59) is inserted in (45) with T ® 2, the result is 


(60) Wr') 


J_ f" 

*h«».r'l 


(*i 


( r - r') H 

K ,)T( r i" Kjjt 


dr, ** < ri < *,, 


if [k 5 , r ] denotes the largest of x* and r'. Writing * for r, we find that this 

S55&5S.‘ ( i 9> ,'t r - 3 ' ■»*»«.S 

ZX 7• ; selecting the branch oS th ° integrand in (36). For, taking the 

the two parte o/the 7 pa C th t St entWith *5* 'f ' 8 ’ fche eC)Uftl eontributidlM from 

p . . the path between k% and x, reinforce each other, while for r' < *, 
the remaining contributions intervals between r' and *,) add InZJZ ,v 
completes the second proof of (36) ' d P to mm ‘ Ihts 



SERIAL CORRELATION 


27 


5. Application to serial correlation. We shall now derive the characteristic 
values in the case that 

(61) q = m = XiXi + XiX j + • • ■ + x T -ix r ■ 

It will be of interest to compare this case with the slightly modified case of 
the quadratic form 

(62) m = X\Xi •+■ X 2 X 3 -)-••• + Xt-iXt + x T Xi , 

which contains an additional term x T xi accomplishing a circular arrangement 
of the variables This modification was originally suggested by Hotelling in 
order to simplify the characteristic polynomial. Other simplifications arising 
out of the circular arrangement will appear below. It is possible, of course, 
that the power of the test of significance of serial correlation is slightly affected 
by the substitution of m for m, but this presumption needs corroboration by 
a study of power functions. 

The characteristic values of m are those values of k for which the determinant 
of order T 


(63) 


— k i 0 

i -k i 


0 

0 

0 


= 0 . 


0 0 0 


By development according to elements of the first row we find that 
(61) A t = —kAt-i — \A T -i , 

from which it follows that 

(65) A T = C 1 H 1 + Cjfj , 
if £1 and £1 arc the roots of 

( 66 ) l? + *£ + \ = 0 , 

satisfying 

(67) £1 + £s = — k, £i£a ■= 

By inserting the known values of Aj and A a in (65), the values of Ci and c a are 
easily found to be such that 

t T +1 j.r+1 

(68) A r = *1.. — " I .. *. 1 -.. 

£1 £a 

Although as a polynomial in k this is a rather complicated expression, the im¬ 
plicit form (68) will suffice for finding the roots of (63). Expressing all other 
variables in terms of one new variable u, 




28 


TJALUN'Ct KOOl’MANH 


1 


(69) Si = — g, £* — — + w \ 

wc find for (68), 

r+i - r i / j\r i:r»t 

(70) A r = (~4) r " - w - - » ( ~ L\ w 

w ; r V 3 ' w-w“‘ V' 3«/ w- - 1 

The only values of u for which this expression vanishes or*' the rt*ot;+ of 

(71) w 4<r4,! « 1, 
excepting those that are also roots of 

(72) w s " 1. 

This leaves us with 

(73) w - 
The corresponding characteristic values are 

(74) k< - cos T £ 1 , 

because the same value of k, is obtained whether the positive or the imgattve 
sign is taken in (73). These are T different values and hence each one n», f » 
single root of (63), 

The characteristic values of nl can now lx 1 derived from (fiJil, although a 
simple straightforward method based on the profiertie* of rirculant* jh aRr 
avadable (see [6], p. 13), Writing 


t 


I n i** f 


t - 1.3 


T. 


T. 


(75) 


At — 


— K | 0 

h -* i 

0 i -K 


i 

0 

0 


- At + 2(-“ l) r *(4) r - *A r t 


i 0 0 ••• - 

we find easily from (70) that 

__ /. , —r+i 7 *~-i >»tiu|\ 

At = (-i) T ( “_ —« _ 2 - “ w 1\ 

(76) \ o> — or 1 w — / 

= (-i) T (u 1 ' + cf r — 2) «t — (“^) r "'(cos fa 


if 

(77) 


U. 


Jo 


A complete set of the values w, for which .Ar vanishes is found from 


(78) 


«< 


2»f 
T ’ 


1,2 


ami the correspond ing characteristic values 4 k, are, according to (69), 
to decreeing Bi*e^a been ftbftndoMdln^79)?' im ' > ° ni1 * olmraoteriBt ' c vftlu “ bearding 




SERIAL CORRELATION 


29 


(79) K t = cos a t — cos — , l = 1, 2 • • • T. 

In contradistinction to the case without circular arrangement, the characteristic 
values with indices t and T — t now coincide, such that all characteristic values 
are double except one (ic r = 1) if T is odd, and except two (k t — I, «ir * — 1) 
if T is even. 

Taking advantage of the duplicity of almost all characteristic values, Ander¬ 
son [6] lni-s derived expressions equivalent to (36) for this case, using methods 
that depend on this particular condition. On the basis of these results he has 

computed 99- and 95-percentiles in the distribution of f = — for the values 

V 

T — 2, 3, i, 5, 6, 7, 9, 11, 13, 15, 25, 45, interpolating the percentiles for inter¬ 
mediate values of T. The 95-pcrcentile for T = 45 is 0 240, as compared with 
0.261 for the normal distribution that provides an asymptotic approxima¬ 
tion. 

Whereas cm this showing the normal approximation is slow in becoming ac¬ 
curate with increasing T, a method for obtaining a much closer approximation 
is available, which works out simplest with respect to f, but can also be applied 
to r. The principle, of this method is applicable whenever the characteristic 
values follow a definite mathematical pattern. 

The. method consists in replacing the, finite number of discrete values k, in 
(36) by a continuous variable X, distributed according to a density function 
suggested by, and as closely as possible approximating to, the scatter of the 
values Hi. According to (79) the valuer Z t are ordinates of the cosine function 
at equidistant points spaced out so as to cover one complete period 2w of that 
function. It is natural to approximate this scatter by the density function 


(80) 


x(k) 


2-kt 


t(1-X 5 )‘ ’ 


of the cosine X = cos of an expression in which the variable f has a rectangular 
distribution between 0 and T. The numerical factor in (80) is Buch that 


( 81 ) 


x(x) tfx 


equals the total nurntier of characteristic values to 1 m; replaced by a density 
function. The idea underlying the sutwtitution of x(X) for the, is to obtain 
what intuitively seems to be in some sense the closest approximation to the 
exact distribution function H(f) that has continuous derivatives of any order 
in any point except the two [Hunts (f » — I and f = +1) that limit its 
range. 

The factor in the integrand in (36) which involves the it is approximated as 
follows: 


(82) 


T 

no* 

i-i 


*tY 




exp 


T_ 

2x 


i; 


1 lo S (* ~ V ) 
(I - X 5 )t 


d\ 



30 


TJA.LLINCJ KOOFMANS 


In order to evaluate the integral 


(83) 


J ssr 


z; 


lOgl* ~ \) 
(1 ~ * 2 )‘ 


rfX 


we shall first prove that its real part is independent of *. nr that 


(84) 


9t £ 1o - bU -- (1 X) „ 1 X) dx e* n 


if Si denotes “the real part of". The integrand in (8-11 has singularities at the 
points X = —1, 0, x, 1. These are of two types. The singularitm* X -j 1 
are introduced by the denominator and make the integrand change i*.« man if 
the argument X is turned once, around either singularity. If *t«rUtijg from » 
point on the real axis we turn the argument. X onee armiml either of the other 
singularities, X = 0 and X * x-, introdueeci hy the numerator, »h<*n tin* real 
part of the integrand is not affected, while 2*-i or — 2ri is added to the imaginary 
part of the numerator, depending on. the ww ndorkwiw* or nntt*rh« kwi'«e) of 
the rotation and on the sign of the logaritlim in (84) respondblr' for the dngulsr- 



Piatma 4. The integration path # 

ity. It follows that one revolution along a elimed curve 0 containing all four 
of the singularities, as indicated in Figure 4, carries us back to the same branch 
of the integrand, after mutually offsetting additions to the imaginary part of 
the numerator and after two changes in sign. This is in accordance with the 
regular character of the integrand at the point X ESZ OO „ 

It follows furthermore that the left hand member of (84) can be replaced by 

•X) 


(85) 


m f ‘-HlilziiLtipe c* 
h (1 - W 


dX. 


For if the curve p is constricted to a path p' running along the real axis from -1 
to +1 and back to -1, the contributions of the two halves of the path will be 

nf'lhl liut °v5 er ’ C mi } r l 8pe0t to aign ' ThiH iH true fw the part, 
of the path p between 0 and x, because the behavior of the real part 

log | * - X | - log | X | 

:^^^rit‘ s r p 0 ^. point * 0 “ d * b » f *• 

not been eliminated. X P«ct to < m (83) where the imaginary part has 



SERIAL CORRELATION 


31 


Finally, if P in (85) is replaced by a large circle | X | = R, the validity of (84) 
follows from the fact that (85) tends to zero if R tends to infinity because the 
integrand is of the order of magnitude of X -2 . 

The real part of the integral in (83) accordingly is 


(80) 




__ log | X | _ „ f 1 log X 

" L(T=m> d *- 2 i ( 1 - 


(1 - X*)* 




or, by the ti an9formaUon X = sin x, 

W - 2 J>* sin xdx = 2 j log cos x dx — log (sin x cos x) 


dx 


(87) 


f log (i sin 2a;) dx = £ log \ + £5RJ, 
Jo A 


so that 


(88) 51 \J = -x log 2. 

In order to evaluate the unaginaiy part $J of (83), it is necessary to specify 
on which side the singularity k is passed by the integration variable X. In fact, 
both cases need to be. considered; the passage of X "over” k for values of k on 
the first part of the path of integration y\ of k in (30), where k goes along the 
real axis from r to 1; and the passage of X "under” k ,for values of k on the second 
part of its path y, , from 1 back to r. If the upper sign in the following formulae 
relates to the fust of these two cases, we have 

(89) 5J7 = + ti J = Tti arc cos k, 

and, from (88) and (89), we find for the last member in (82) 

(QQ) e ~iiv/r _ 2te g±lr> #tO co« « 

Writing 

(91) arc cos k = a, k = cos a, c |r,a — e _,r ‘" = 2 i sin JfTa, 

wc find the following approximation for Ti(f) by inserting (90) in (30) as indicated 
in (82): 

(92) 7 i(f) ~ ^ t, 1 ^ 2 . f (cos « ~ f) lr " J sin \Ta sin a doc 

t Jo 

Calculations of the distribution function and of its percentiles will be much 
simpler for this approximation than for the exact function. 

In the case of r ~ m/p in which no circular arrangement is made a slight 
complication arises. The characteristic values ki given in (74) are again ordi¬ 
nates of the cosine function at equidistant points, but they do not cover a com¬ 
plete period or half-period of this function. Probably the most accurate pro- 



32 


TJAI.MNO KOOIUWANS 


cedure would be to replace the limits of integration m '83; by a*- j \«i T r ]jJ 
and cos [(T 4- ^)r/(T + 1)3, w iis to have each discrete integral -,r%hw i,f f j H 
(74) contribute an interval « - §, l -f V* c *f nmf hnidh **< *hr misy , t ( 
the rectangularly distributed van a I tie r now dehruliit the de.?n?»;;te 4 ii of 
X = cos [rr/(T -h 1)1; while making such an adjustment »n the numerical 
factor in (80) that the equivalent of (81) with tin* new limit,- <>{ integration t* 
satisfied. However, the evaluation of (83) and the wmjihriri <d ih«- result i >.v«n* 
tially rest on the fact that the limit.- of integration mitaerde w»»h -iingiriaritH** 
of the integrand. In these circumstance* a rather *mtf»Ie in -tilt * an again he 
obtained by introducing two further changes which very nearh r-unj^ nsate 
each other. The first change is the arbitrary vxtenrion of the limit- of Integra* 
tion to what they are in (83), while increasing the mmc-rwsl farter m Sri? in 
such a manner that the integral in (81) will be T •( I instead of T The h aves 
the described contributions of the discrete values nf / in "t* to the range ,,{ ? 
unaffected, but adds to that range the two interval* 'it, §« au«| T ■» §. Ti of 
half a unit length not representing anyfliing that w already j-o w-ns Thi« 
can be largely offset by introducing two additional de-crvte vahun f 0 and 
t = T + 1, each with the negative weiglit - ). if the weight of all other dHcretc 
values is considered to be +1. Instead of (82) we then have 

T 

-t 2 
e <-i 

~ ex P [“ j[j ' dx f- i log (*-- 1| h l log f* 4 1) 

If this expression is inserted in (38) with y, ennui ricted to y' , the argument of 

(94) e «> locr.-n ^ ^ _ x yj4 

is — xf/4 when k goes from r to 1, and rt'/4 when * return* from 1 to r. On 
account of 

(1 — K*) tM =» sin* a, 

g—Kr+nio+rf/* o 2{ . ain j^j. + 


(95) 


l(r+l)to-r(M 


the result now is 


h(r) 


(i \T - l)2* m 


(96) 


J **re ooa r 

I (oos « - r)‘ r - 5 sin IKr + 1 )« 


f/4] sin* 1 o da. 

teww 0 * “TTA* T™ by dircct i^gration that the conditions equivalent 
i f S f!f e Ar y appr0Ximate expressions (92) and (90). This follows 

and tt fieioiiteit 6 m„? Ce ° f 2 bctWeen the de K rees in * of the numerator 
enommator.m (36) is preserved by the substitutions (82) and (93); 



SERIAL CORRELATION 


33 

that the numerical value of the limit for k —> °° of k times the integrand in t3fi) 
is not changed; and that no singularities outside the .segment “1 S * f? 1 itf 
the real axis are introduced. 

There 1 is, of course, a certain degree of distortion involved in replacing the 
exact distribution functions by the smooth approximations derived. Such dis¬ 
tortion is most serious in so far as it occurs at the tails of the distribution, where 
tin* usual significance limits are located. For instance, the exact distribution of 
r is asvmmetiie if T is odd, and ranges from cos [(7’ — l)7r/7'] to hi, whereas 
the smooth appioximation is symmetric and ranges from —1 to -1-1. In the 
ease of \ both the exact distribution and the approximation are symmetric, but 
the former ranges from cos [ Tir/{T -f 1)] to cos [7 t/( 7’ + I)], the latter from 
— 1 to +1. However, this difference is to some extent compensated by n curious 
anomaly m the function (90) This function actually dips Mow zero on .sym¬ 
metrically placed small inteivals adjoining —1 and +1, the length of a Inch is 
of the ordei of the difference 1 — cos W/iT -j- 1)] between unity and the highest 
characteristic value. Percentiles must therefore be counted on both sides from 
two points absolutely smaller than unity, defined by requiring that tlu 1 small 
parts of the area "under” (he euivo (95) outside these points are algebraically 
zero each. 

These* distortions have importance only for small values of T. Anderson 
finds ([(i| p. 52) that the exact function 7i(r) is symmetrical within three-decimal 
accuracy for nil values of T £ 11 (the modal value K(0) for T -- 11 is about 1.27). 
There lire in the ease of f three characteristic values k, exceeding the On-percen¬ 
tile as given by Anderson for T - 7; 5 for T = 13; 11 for T = 25. Correspond¬ 
ing numbers for the 99-percentile are 3 for T = 13; 9 for T -- 25; 17 for T ---•15 
These numbers suggest that the approximations (92) and (90) will provide good 
significance limits long before the normal approximation is acceptable, Accurate 
calculations will be needed to find out from what value of T onward the ap¬ 
proximations can safely be substituted for the exact, distributions. 

REFERENCES 

[1] J. Tishkrc.es, Husincss cycles in Ihe United Slates of America 1910-1982, Luagim of 

Nations, (ieneva, 1039. 

M. M. Flood, "Ueeuraive methoda in huHtncHH-cyde analysis,” Kcammelricn, Vol, H 
anui) p. 333. 

(2) II. Woui, A study tn the, Analysis of Stationary Time Series, UppBala, 103K, 

(31 K, S. Wii.ks, Statistical Inference, , Ann Arbor, 1037 

[41 J. von Neumann, "Distribution of the* ratio of the mean square successive difference 
l» the variance,” Annals of Math, Slat , Vol, 12(1941), pp. 307 -300. 

(5) J. vox Neumann, "A further remark concerning the diHtrihutioii of the ratio of the 
mean square successive difference to the variance," Annals of Math. Slat., Vol. 
13(1942), pp, 80--8H. 

[Q] It. L. Axdkhsox, Serial Correlation in Ihc Analysis of Time Senes, unpublished thesis, 
Iowa State College, 1911. 



A GENERALIZED ANALYSIS OF VARIANCE 
By Fkankun E. Sattekthwaiti* 

University of Iowa and Aetna Lift Insurance ('tmpany 

Tlio analysis of variance is a statistical technique whow held* nf aj«|«1irnftort 
are only beginning to be explored. A few simple standard dt-rigtv* npi-car m the 
literature and a great deal has been done with them. However, »f the applied 
statistician limits himself to such standard designs, he soon finds that tnnnv of 
his problems are receiving inadequate or inappropriate treatment, The writer 
has found this particularly true in his own field where nurd of the raw data are 
in the nature of frequencies or averages which lack homogeneity of variance. 
Also the nature of the problem usually indicates the u*e of wrighMl averagr* 
rather than simple averages and sometimes part of (lie data arc mi-dog. 

The purpose of this study is to examine the fundamental prun iph - under* 
lying analysis of variance designs and to show how dedgns wav Is* roti-t runted 
and applied to practically any (lata which cun In* assumed to W nnrmnllv 
distributed. 


1. Test of independence. In the analysis of variance we calculate two or 
more statistics of the types, 

X s » £(x, - Ml,)*, 

X 1 - 2 B) . 

The a/’s are considered to be independent variables from a normal jKjpulntiori. 
The m,’s and the 0/s are homogeneous linear functions of the j,V Heretofore 
the demonstration of the independence of the x 5l s used has only hwn made for 
certain special 3,’s and m/s. To make our analysis general we shall let our 0/a 
be general homogeneous linear functions of the x.’h and we shall define our m/s 
through certain linear homogeneous restrictions. 

Let us define Chi-square as 


X 2 ~ 2(x< - m,) 


where the x/s are independent normally distributed variables with mean aero 
and unit variance. We also define certain linear functions of tlu* x/s,' 


W 0 / ” a/ ( X(, 

which we shall assume to have been orthogonalized. 5 
make use of the linear restrictions 


j « 1,2,-** 
To define the m/a we 


subatripT'Sf III r fl a U nt°t t m Wi fi alWay8 , indi0a !" HU . mmatJcm with W* to that 
n Bltn F , u B nptB ran , g frora 110 n unless otherwise specified. The Kronecker 
D, ltM ( equals one or zero depending on whether < equals or dZ noTequate' 

The are orthogonal if a,*,* - Any algebraically independent 1st may be 

34 



ANALYSIS OF VARIANCE 


35 


(2) a.ji{xi — m.) = 0, j ** 1, 2, • • • s. 

or 


a,,m, = a,(X, = d,. 

This system has an (n — s)-infinitude of solutions and we should not expert, 
all of these to be suitable for our purposes. For reasons which will appear later 
we shall choose the single solution, 

(3-) m k = a, k 0j = a,ka„Z {, j « 1, • * ■ 9. 

This is the solution which follows if we complete the system (2) with n — n 
additional linear restrictions on the m/a which are homogeneous and which form 
an orthogonal set with (2). Thus 

= Oj, 

= 0 , j — s + 1 , • • • n. 

This is consistent with standard analysis of .variance designs. For the usual 
one way analysis, we have 

(d) Z -,7s m,i = *i. 3 m 

which yield a solution according to (3), 

m >‘ " j ** i, ••• a. 

The additional homogeneous restrictions in this case might have been taken as 

m ,i = m ,a = • • • ~ m, r , j « 1, • • ■ 8, 

which are orthogonal to (4) and may be easily orthogonalized among themselves. 
Substituting the values of the m/s obtained in (3) into Chi-square, we obtain, 

X 2 m (x, - - m,) 

*** (5,* “* Cl tl (X ,k)%k{fi(l — i J, 171 1, * * ' S, 

Ba (4i — — a,ia,k + 5jf m a,fcfl m i)xitXj 

*= (5*1 - a (Jt a,i)x*xi. 


replaced by an equivalent orthogonal set. Thua, if ia not orthogonal to 9,, it may be 
replaced by 8\ *■ flj + k$ lr where k ia determined by 

Oi/(o»i + fcaij) ■» 0 

or k - -ana,//at,an. 

The condition Xa(, — 2a),- « l can alwaya be mot by simple diviaion. 



36 


FRANKLIN R. HATTKRTHWAM R 


The sum of the squares of the 0,'s is 

S0® >= 8 , 6 , , J " U."'*, 

Therefore we have the relation, 

(5) X* + 113 ShVkZi “ ~A 

The rank, iJ,, of each 0* is obviously equal to unity since it t« the square of a 
linear form. The rank, , of x* is at. least equal to n -* « Mttw the rank of the 
right hand side of (5) is n. Also, /?» can not l*e greater than n a since, 

n,*( 5 *! ~ a,*o,*) =» n,i — S„«,», t, j I, > * > it, 

re o 

gives s independent relations between the rows nf its coefficient matrix. There¬ 
fore we have the relation, 

(6) Ra + Hi + • * • ■+•/£*“ n. 

The two conditions, (5) and (6), are. sufficient* condition* for \* and the #J'» 
each to be independent of the others and each to be distributed as is t Iti-aquans 
with the number of degrees of freedom equal to its rank. 


2. Adjustment of data. The above development is rmt general enough for 
many practical problems. We do not always have given data, g ,, which aw* 
normally distributed about a mean zero with unit (or homogeneous) variance. 
Of course if the means, w<, and variances, <4 , are known, we may make the 
transformation, 


(7) 


Xt ®> 


y< ~ rfn 


and apply our theory in a straight forward manner. We shall now check the 
effect on our analysis if the rfu’s and c-j's are determined, in part at hast, from 
our data, the 

Let us assume that the xh of (7) are normally and independently distributed 
variables about a mean zero and with unit variance. Let us ate define certain 
linear orthogonal functions of the first r of the e/s by 


<t>k ** bkiSi «=* bkjdtiXi k «■ 1, 2, * * * (f, 

“ i « l, 2, * • * r. 

We next form the characteristic function of the joint distribution of x*, of 
, of 0 r+1 , ■ • ■ 6, , and of , This is 


w the Independence of certain estimates of variance,” AtmaU of 

Math Stat., Vol. 9(1038), pp, 46-65, 



ANALYSIS OF VARIANCE 


37 


Hi, U, V r+ i , • • ■ , V. , Wi, •••«>,) 

= K j ■■■ / exp [*7 X * + 

+ — JS"x)](fx n * rfxi- 

The conditions (5) and (6) are sufficient 4 for there to exist an orthogonal trans¬ 
formation of the>, :t;,’s which will convert 

0; to J„ j - 1, • • • 8 , 

x to Sg- 410 /, 

2i x) to sr 0*, 
dV = ndx, to Ild0,. 

The characteristic function then, takes the form, 

* = K jnl / exp [-K1 - 2tu) ( 0 , - 

{II? exp [io*/2(l - 2m)]| 

/ exp [—4(1 - 2tu)0*] r/0,j 

{n;„ / exp [-1(1 - 2r7)0j] de)j , 

where 

2(tn*hO* = £u>*, 

since the b&,’n are orthogonal. 

At the beginning of this section we stated that we wished in some way to use 
our data, the y,' s, to estimate the m,’s and the <r,’a. A suitable method is to 
restrict the functions, (8), to zero. 

Our problem thus reduces to finding the distribution of the "array" in our 
joint distribution for which 

4>i — <h ~ • • ■ — <Pi — 0 . 

Except for perhapH a constant factor, the characteristic, function of the distri¬ 
bution of such an array is obtained from <I> by integrating out the uu's. 6 Thus, 
on performing the integrations, we have, 

4 Seo A. T. Craig, tlnd. 

‘This is easily seen since if one passes from the characteristic function to the joint 
distribution, equates then's to zero, and then posses back to the characteristic function, 
all the integrations except the above appear in pairs of the form 

~ / 6,11 J B " ,1 ** ^ ^ x > 


which leave $ unchanged, 



38 


FB.ANKI.IK E. 8ATTBKTKWAITK 


*'(*, U, V r+i 


tf((l - 2iu)- ,f j s J!H; + ,< 1 - 2n ,i 5, i 

‘<1 2ifS - ' * *j. 


which shows that 21 5*, 6?+i, ■ * * S) , and arc each indcjapinlcnt of the nthcr* 
and that each is distributed according to the (’ht-squ»rr distribution with 
r - q, 1, • * ■ 1, and n — s degrees of freedom respectively. 


3. Numerical application. The developments t»f the presiding; sections have 
been abbreviated to cover technical points alone. We shad now take a definite 
practical problem and see how we may work out its solution with the aid of the 
above techniques. 

In Table I are given the losses, the exposures tin ear years! and the indicated 
pure premiums from the Massachusetts Statutory Liability automobile insur¬ 
ance experience for four towns and for three different p1a*w*« of car- «To 
illustrate the effect of missing items, the data for town //, rlsuw IP, and for 
town C, class F, have been omitted.) Our problem is to determine if there m a 
significant variation in the indicated pure premium lietwcen the different towns 
and between the different classes of cars. 

Our first problem is to set up a normally distributed variable al*mt a mean 
zero and with homogeneous variance. The true mean, m. , of the distribution 
of the indicated pure premiums, P,, is unknown. Under the hypothesis that 
the different towns and classes of cars are homogeneous with each other, we 
may assume that tlie rlu’s are all equal. We may estimate their value by using 
the combined indicated P for the whole territory, which is, *32.-M. ity a pre¬ 
liminary argument, which need not concern us here, we show that the variance, 
ei, of an indicated pure premium is inversely proportional to the exposure, E ,, 
on which it is based but the constant of proportionality is unknown. If we 
now make the assumption that the indicated pure premiums are normally 
distributed, we may convert them to the form 


*< 


r t 


\/W~ 

which will be normally distributed about a mean zero with homogeneous vnri- 
Sea, * l f 6tati8tic8 Bnd «*««» ^em in thn Ubte. 

are subUr ° f ?< ’ ^ 2 ;f 4, Wafl £y!linml * d frot » data, the *,*« 

are subject to the single homogeneous linear restriction, 

0 = 2(Li - ?E t ) 


2 EV 1 


P<~? 

Tjffl 


- ZE\' x x { . 


( 9 ) 



Restriction?, 0 


AN'ALYSIB OF VARIANCE 
























40 


FKAHKUN E, BATTERTHWAHK 


The next step is to oxpreas the indicated pure premium" for ra<-ti town and 
for each class of car as 0/a as defined in equation il). For i<mn A v,. have nn 
indicated pure premium of $33.21 when all claws «f can- w rwnhiwd. This 
breaks down as follows: 

33.21 -E&WS&, » - h'U. 

- [ZEJXt/El s 4- 32.44)1 
« 2{K\*/ZK,)i, + 32.11. 

Dividing this by the square root of the sum of the square* of the rr«<ffirie»t«, 
we obtain, 

0i - (2/5,) lf3 (33.21 - 32.41), s * 1,4, H. 

(10) - 2(«* , V(2A , .)"V., 

which is of the foYm of (1). Wc have entered the coefficient of ft except for 
the eommon denominator, (EjL\) I/4 , whose square is, entered on line U'j) under 
Restriction (1) in the table. Similarly, we have entered the value, for the other 
towns under Restrictions^), (3), and (4). The values for the ciav.es of ram 
are entered under (5), (6), and (7). 

The next step is to orthogonalixe the ft's. The first four have no common 
elements so they arc orthogonal by insertion. To make ft orthogonal to ft 
we must add to 0 t , 

bn w — 2arta,i/Eft?i 

times 0i. This and similar coefficients for making 0 4 orthogonal to ft , ft , and ft 
are entered on line (2'). We may now replace ft by the equivalent ft* by the 
formula 

( 11 ) <*«' — + kudu 4 * feud# + kt$a,4 4 * • 

Similar k's for 0« are entered on line (3') and 0» is replaced by ft. . ft should lws 
ignored since it is algebraically dependent on the other ft's: 

ft m ft 4 " ft 4 * ft 4 * ft — 0 * — ft. 

Note that on line (4') we have entered 2a </ for checking the calculation (11). 
We next calculate the 0 3, s according to the, formula, 

0? = t2a<jxj]*/2aii. 

Note that for this particular design all the ft’s except 0*- and ft- are numerically 
equal to the corresponding aq’s (enclosed in parentheses). 

Returning to equation (9), we see that it is equivalent to either of the following 
restrictions on the 0,’s: 

sf ft 4 - E\ ,i g 1 + EfBi 4 - O* - 0 



ANALYSIS OP VARIANCE 


41 


or 



= 0. 


Therefore we may conclude that 

* (0? + 02 + + 0*)/4 = 96,469/a* 

distributed a« is Chi-Square with three degrees of freedom. Also we may 
conclude that 

iS'l/er* =* (65 -f $1 + #?)/V* ~ 79,349/a*, 

is distributed as is Chi-square with two degrees of freedom. Note that we have 
not proved, and indeed it is not so, that $ and $ are independent. 

We have yet to obtain our interaction sum of squares Equation (5) is of 
assistance, licit' giving, 

= lix, — [oi -f 0* -f- dl + T ^5' + 0$')]/ff* 

395,380 - 173,051 222,309 

w » - - « ^ - - * - acs .« « — - 

i i 

Ox 

This is distributed as is Chi-square with 10 — 6 — 4 degrees of freedom, Also 
it is iudejM-ndent of S\ and of Nj, 

Lastly we form die variance ratios 


„ 96,409/3 

h “ 222,300/4 

,, 79,349/2 

1 222,309/4 


- 0.58, 

- 0.71, 


which art* not significant. 

We therefore conclude that as far as the present data and analysis show, we 
have no reason to believe that these three classes of cars and these four towns 
are nut all subject to the same true premium rate. 



DISTRIBUTIONS IN STRATIFIED SAMPLING 

Hr Paul II. Axur-iutas 
University nf Ilhmttx 

1. Introduction. In this paper, distribution,-* of means. and standard devin- 
tions will be derived for random and stratified sample*. It i<* not it«*c<wiry to 
define random sampling here, for one may find it defined in any elementary 
text. If before drawing the sample from a population r, it. is, divided into 
several strata n ,n, • • ■ , r,, and the sample 2 is composed ««f a partial samples 
Si, Sj, • “ ,2, each drawn with or without replacement from the strata; and 
if the sizes m, of the partial samples are proportionate to the d&w* M, or cor¬ 
responding strata, i e., m, ■= kMi , then the sample which i* obtained in this 
manner is a stratified sample. When the sixes of the partial samples are not 
proportionate to the sizes of the corresponding atr&ta, the dHlnbiffiuu* of means 
and standard deviations will differ from the distributions obtained when the 
sizes of the partial samples are proportionate to the sizes of the corresponding 
strata. This will be shown in the sections that follow. 

The distributions of means and standard deviations from well-known popula¬ 
tions for stratified and random samples will lie derived and compared, m to 
scatter and symmetry. It should be remembered even though stratification 
has little to recommend its use, in some cases, over random sampling, the im* 
possibility of obtaining random samples makes its use necessary, Hincc most 
of the problems with which the practical statistician i« confronted are of the 
kind which make random sampling difficult or even impossible, stratified 
sampling is being investigated by many research workers. 


2. The distribution of means and standard deviations for samples of two 
drawn from any population having a continuous frequency function. Let fix) 
be a continuous frequency function whose mean is zero, and for a < x < b, 
let fix) > o, elsewhere let f(x) ~ o. We select a sample of two elements be, *, 
*V wtl * cb Gan be re P re sented by a point in a square of aide ii - a, as jx>mt P in 
Fig. 1. It is well known that the probability of getting a simple point in the 
element of area dx x dxt is /(zi)/(z a ) dxi dx j. The probability of gelling a value 

immWs le f than the . val l iCl of the raeatl foproaePhKl by a |>omt on the line 
iti v-T ig. l) whose equation is Xi -I" %% ® 2^^ is given by 


a) 


dr, I dxtf{x{)f{xf), 

4* v a 


The distribution of $ < $( a + b) is 


( 2 ) 


»2£—a 

2 Ja /^0/(2f - x,) dxi, 


42 



stratified sampling 


43 


whirls b ♦di'inncd i,y differentiating (1) with respect to x. For all values 
js > ^ Hill'd ipc another equation which we shall now derive similarly. 

Th<* prol-nbjbtv <4 nb'.ainiiiR a mean less than the mean of any point on R'T 
< Fig. 1 * b 

*» j*fe 

dr i | dxif(ri)f(xi). 
lUffirffitiaring 0u*> expression, we obtain 
( 3 ) 2 f f(ci)f(2£ —xi) dxi. 

Js-a-b 


b 

T / 


The didnlustn-ii of means is given by (2) and (3). Let us apply the. theorem 
to the reriAiigiilnr jtupulatiori 

- 1, < r < i a ~ b - h 

, t 0 elsewhere. 

dulmUtuting in <‘i\ and i 3} respectively, the results obtained are 

«2fM-2f), for £ < 0, 
fftI « 2(1 - 2J), for ^ > 0. 

J. O. Irwin {1! and Philip Hall (2] obtained these results also but by different 
methods. However, the distribution of 2£ waa known to Laplace and other 

earlier writers. . 

From Fig. I, it is «*» that the probability of obtaining a value of o (standard 

deviation), less than the value of S on AB whose equation is — xi <=* 2 S is 



Fiti. 1 





44 


H. ANt.t'.ltMlS 


*h“ 2$ 

i — 2 (hi I iiTifij-iifji), 

* a J.f+S) 

Upon differentiating this expression with respect to S t w r obtain 

fb w 

(4) A(S) = 4 J f( j)/t2S 4 x > Jr. 

For the rectangular population /i(.S') «» 4(1 — 2N). Thi* result «itr«v,v with 
that found by P, R. Rider [3). 

3. Sampling from a rectangular population, hot the roctangulisr f-ofnilation 
be J(x) — 1, for 0 < x < 1, elsewhere /(x) « 1). From Urn population we 



select a stratified sample of two elements which is chosen m that 0 < x, < \ 
and | £ Xi < 1. The probability of obtaining a mean less than the meat) of 
any point on the line R'T' (Fig. 2) whose, equation is xi q~ x* 2 J, j<. 


fit-i pU-x, 

4 Jo dXl Jo dXi 585 4 ^ 3 “ x + $)■ 

Snn i la ^- y ;i he P rob , abiUty of obtaining a mean less than the mean of any point 
on u 1 (big, 2) whose equation is z x + % « 2,t l , is 

1 ~ 4 iL dXl C* dx1 " 12 ^ - - »• 

8ide ° f the above two with reaped to 

rectan^l?h b ? 10 ^ “ e “ 8 ° f stTatifled sa “P^ of two elements from a 

rectangular distribution function to be 



*rKvnm,n «ami‘i.iko 


45 


(S' 


r)' i 1 


Ktj- 

I I/, 


fc,r I 1 * < I, 

fnr i f x < J, 


Tilt* liinfrii.:;'|f-TJ > f u • 

J-Xp* 11,3*0 n r. 


biT rondnm -ampF,- of two elements from the same 


(ilj 


9 


bt, far U < j < ^ 

1 •**'» f**r \ < i <f l. 


F}W«tI ) VjliSllitSU. 4 , •" Olid til 'A«' ni fjtijf 

A- d ••raMh'd -'■«“!‘2*' n*«-4ii,H arc niw stable than the random means. 

ft Th< T-md-M ''su-l-h ami the stratified sample means are both 

di 'i.b'iVd "t jMiiftrj* ally 

f, 1 to- *4 the random means i- twice the range of the stratified 

?»« vie 

It r* m,»m- }»»A‘ *>> !,nd »h* tiednhnfioHfl «f the standard deviations for samples 
of two »■]« sn* sdnte d« li'cnt r K'teeU'd frujn raeh half of the jtopulalion. 
All |«-ns' *'j» Jli 1 Tel 2, ha\«’ tin* 5 *aja.»* e’andrird deviation, Furthennnre the 
erjnatji-n of Ut v s » 2A Tin* probability of obtaining a standard 

deviation l« v. *he ''oteiiifi deviation nf any jwiint rm AH (Fig. 2) is 



Ftirtl««*m,»>re, »W pr»•kibih 1 'v <»f Retting a 'fntidard deviation leas than the 
standard > 4 , tb»« !»»*• A7f ifig. 2) of which the equation is 

Xi ■ *t **- v , v 

t r” **. 

1 t / ts, / 4s, -1 US 1 + 8.S. 

lHffi-rctPtwtion nd t!»r right hand of the alstvt* two equations with respect 
to .S' v »«-!'!.• the tlt^tnhoJntn <4 «tan>hml deviation.- of stratified samples of two 
element* fmin a rectangular distribution function to Ik* 


- ills', for 0 < S < }, 

(?) h ' *» i 

** K IfiX, for | < S < 

The distribution *«f »b>- M-amUrd drvtultnjw for randomaantplot of two elements is 

(8) h\$) ~ H ■ 2A*), for 0 < S < j. 

From i? t and <Hi it is ia->tly r«# that. 

A, Thr range of the st^uilard deviation:, for strati lied and random samples 

1* the woe, 

H, 'Dm distribution »f stawt&rd deviations for random samples of two 
element# in skewed, hut the distribution of the standard deviations for 
stratified sample* of two elements is symmetrical. 



46 


run, h. ANDt.HfttiJf 


If we take a. random sample of two elements from the reef angular |tM|»ulati«>n on 
the interval — £ < x < then Student's ratio l «=> V2L ’\ 'j s t u, ■ S‘f 

will have the distribution 


- 1/2 t - l) s 
F(t) 

® 1 /2(l Hh l) 1 


for f < O t 
for t r : 0. 


This result was obtained by Laderman {7} and nttiers. According to the reason* 
ing used by Laderman, the probability of getting a value of' W* than the value 
on OS is (for stratified samples of two elements) 


,o plum ti-n 

4 J ^ dx, J dx i *» 


-§(t + li ft - n. 


When t > 0, the probability of obtaining a value of t greater than the value on 
OS is 


4 


f 1 dr, f dr,, 


It follows easily that the probability of getting a value of t Uhh thssn the value 
represented on OS is for stratified samples equal to 

1-4 f dx, [ dx, •*» 1 + \{l - l)/(t f J). 

Jo J*j(l-l)/ 0 +l) 

Differentiating the right-hand side of the first and third rIkivc equations with 
respect to t, we find the distribution of Student’s ratio for stratified samples of 
two elements from a rectangular population to lie. 

m - 1/(£ “ «*. for -1 < l < 0 , 

- l/(f 4- D s , for 0 < t < 1 . 


Comparing the random sample and stratified sample distribution* of t, we 
find that 

A The stratified t’s are more stable than the random I'n. 

B. Both distributions are symmetrical. 


C. The range for the stratified f’is is -1 < l < + 1 , while the range for 
the f’s obtained from random samples is — «o <. t < 4 . », 

By means of a different method, distributions of means of stratified samples 
will be obtained. Let (A) and (B) be rectangular population* /(*), f(y) re¬ 
spectively, with positive values on the interval 0, L From the rectangular 
population (A) select a stratified sample of two elements x, and x% such that 

. - ** -*’ M ^ - L Then the P roba Wlity of getting a sample point in the 
element of area dx, dx, is 4 dx, dx,. Now let y, = 2x, (change of unit of measure¬ 
ment j y, - 2x 2 — 1 (change of unit of measurement and translation). Then 
4 dXl dXi ~ d V l d V* ■ We have also that 0 < y, < 1, 0 ^ y t < 1. With re- 



• JM-nmn mwiunys 


47 


S ppr! »*< tit*' *4 *k*‘ nv-an*. ;t stratified sample of two elements from 

tA‘ n Sli*' n* ■* iwtom cample <f nut element* from (B). Now the means 
for latch n- '-'*»«/'-* «.f tv..* in>m 'lit have the distribution g(y) which 

b jeuby < Hf’T*' t,, mAh v ifuted for j 1 . Furthermore, wo have 

it -" 5 ! -s 4 ■■■‘S' 1 27 » * 2/ * 11 ” 2t’ $ Hence it, follows readily that 

p! 1 1 ' t f"t l *■' * * / j 1 - \2 • lHx for § < i < |, 

H««» *h«‘ rcVuitfuJar take a stratified sample, of three, ele¬ 
ments tt * -h " s. § i’i r s i / ' 1. The sample points will all lie in a 

rut*- within *J.*' wui '’ri?*- Th<-n the probability of getting a sample point in 
the elvmr-IA * J I'-himr djr 5 d? f *rir, is 27 dx S ftx t iix ». Now let y, * 3;j,, y 1 = 

Sr* l.v* ' 2 1 hinb-r«‘ rt y, < 1, for i » 1,2,3. Furthermore, 

dyjdj«dy» - *7 iS,r,rfT»*te> W Ph r» -.j^rf to the distribution of the means, a 
xtratitad “ample *4 tin-*' rh-rimM* from 'At is the same as a random sample of 
Ihttr * id’ H N«*» the mem* for random sample of three elements 

from dli hove "!*■ di tribute n 


27*/, 2, 

for <) < p < §, 

!f v 1 ■ o:< *1^' • 1 i/2. 

for i < j? < |, 

27'! yd. 2. 

for | < 1. 

>»» v -**- 3,< 3 hor y « n, i,|, 1,2 ^ 

i J, §, $, reflectively. Hence 

.hi»:* jf id 2. 

for l <f< 

«‘*t • •27,iAil « 51/ ■ 13^2, 

for | < i* < j, 

Mt2 3rd 2. 

for | < 2 < b 


Thin we hmie fons.4 tin- ih’tribwlion of the mean** for stratified samples of three 
element* when *-r.*- «*J» in»m r*< wdkrierl from earh third of the population. 

From the r* rt<mipi!»T jim juila lion iAi, take » stratified sample of four elements 
0 < /, f §. f jt f §, | \ x t < 1.1 < jr* < 1. Again, a stratified sample of 
four element* from j A j mitb nrmpcri to the distribution of means) is the same 
« a random sample of four elements from tB). The means for random samples 
of four elements from "Hi have the dtetribuikm: 

/ 13 h«VM. forO<#<i, 


to pm *• 


Since f *■ 

f|> Hett«- 


,«t| . , 25 y - }>' - Wf 

*11 • 2 th# - |l f it 4Htj? 

’rihii ff,3. 

- |, me have for f 


for i < ^ i, 

4/1/3. hir § < g £ h 

for f < $ < 1. 

0, i I, I, 1 resj»*elively, i « t, ■&, 4, A, 


(512,i \2 ~ 4 d/3, 

,!32jl -Si - %f 4W« - 2)*)/3, 
32|I ■' 5N<42 ~ »f + 4Rt« - 2) , }/3. 
(srifi - a + i)Va, 


for | < 2 < tV, 
for x T B < x < i 
for \ < £ < 
for A <£ <«. 



48 


PAVI* H, ASDCIiMtS 


This is the* distribution of the means for stratified wmtph« mi four element* !,»m» 
element from each quartile). We ran extend the m '•fraMth**! Mouph* of wsc 
n where one element is selected from eaeli stratum and there are n strata A‘* 
n increases, we note that 

A. The range of the means decrease 1 * 

B. The scatter of the means decreiiM«. 

C. The number of ares in the distribution of the mean- mereav*., 

Take the stratified sample of four element*, .two clement* from each brain, 
0 K xi ^ 4i 0 Xi < | < Xa <« 1, J ^ jt* ^ 1 . With rr^jneei <>* the distribu¬ 

tion of the means, a stratified sample of four element* from / A a i* flu* same m 
a random sample of four elements from (B|. Now the means for random 
samples of four elements from (B) have the distriUufmu ;t*«. Furthermore 
■8 = 2.8 - i d% = 2dx , and for y ® 0, *, %. I, l. Jr -> i J. f. | Thu-. 


' 286 ( 2 * - 4 ) 73 , 

16[1 - 24(21 - l) 2 - 48(21 - 
16(1 - 24(2i - l) 2 + m'2f ~ 1B. 
250(| - 21)73, 


for J r •' I. 

for | * J , ,r < §, 

for | e J 

for 1 - J ' r - j„ 


Hence the distribution of the means for stratified sample of four el* nnit ftrn, 
elements from each half) has been found. 

If we take a stratified sample of six elements (three elements from ertrh Imtfs 
we find that the graph of the, distribution function of the means will consist of 
six ares; the range will be { <, £ < Thus we w* that a* we take more ele¬ 
ments from each half, the distribution liecomes smoother, 'Fite number of 
arcs in the distribution of the means also increases. The mug** of the menus 
remains the same but scatter decreases as we take more element* from each 
half of the population (A). 

The results so far obtained are true for the rectangular imputation which is 
symmetric. In order to make further comparisons in the distribution* of means 
and standard deviations for stratified and random samples, let us now consider 


A -Tx P O to < f r?i a r? k r ed o P ,° PU i ati0n * Lpt u * the population 

i “ X ; ° - 11 ." 0 elaewh °re. If we take random sample-, of two 

eemente from this population, the points represented by eneh sample will lie 



KnUTIFIF.D HAMPLI.N'G 


49 


jvipulatims The other element is selected from the range (\ < x-i < 1) which 
run-tit uf<-. On* - quartans of the total population. By use of the geometric 
method the di-tribufmn of tin- stratified means in found to be 

" lie 32/ — (if f- 1)/‘J, for i < £ < L 
girf ~~ 

MiWf ~ !( - 32/)/!), for } < J < 

Tin* range of the stratified means in leas, and the clistrilmtion is more nearly 
symmetrical than it is for the random means as may be seen by comparing the 
graphs of the two d/tribntion functions. Tints we see, that stratification gives 
the mean-’ greater stability. The distribution of the standard deviations of the 
stratified samples of two elements is: 

»- 04(3« - 8,/)/<), for 0 < .S' < J, 

A * 8 ) 

- 128(4.8* - 3,8 + 1)/U, for \ < 8 < 

/(wm comparing the distributions of tin* standard deviations for random and 
stratified -ampins ue uI^mtvc that the random case yields a single cubic whereas 
the stratified ease yields two rubies. The distribution obtained for the stratified 
ea^e is more symmetrical than it is for the random case as may be seen by 
sketching the graphs of the two distribution functions, The range for both 
distributions is the same. 

3. Sampling from a normal population. We shall consider a normal papula¬ 
tion /•“ having the frequency function r } *V\/2ir, (— * < x < «)| and the tfch 
moment about the mean will he g, . Divide this population info two equal 
parts f-‘, mu! F s such that the fretjueney function of h\ is 2c !r ’/ \/‘U, ( — < 

s < 0), and tin* fretjueney function of l'i is 2c ^VVZir, (0 < x < =*=), The 
ith moment of F\ aland the origin will lie ni,i, while the tth moment about its 
mean will la* n.i ; the eorres|Hmding ith moments for Ft will be m,s and fi,t re¬ 
spectively. In what follows M[ will be the tth moment, about, the origin of the 
distribution sought, while .1/, will be the tth moment about the mean. Further¬ 
more, the constants A, A , x, .8. (measure of skewness) which will be used here 
are defined in Kklerton (HJ. Finally, F[fU)\ will he the expected value of f(x). 

If we take a random sample of n elements x \, xj, «• • , x n from F i and a 
random wimple of n elements j„,i , • * • , xj* from Ft, the 2n elements x \, • • • , 
jr„, xj,, will be a stratified sample, from the population F. Let 

Pi n« 

h » (I/n) Y x ,, ,h *• (I /n) Y A , and £ « §(/ + *); then £ will be the 

i-*h * i 

mean of the stratified sample. By using Tchouproff’a [0] formulae and expected 
values, we obtain the following values: 

AT = E{£) *= §(mu + mu) * 0, 

Mt = E(£‘) — (fit i + pa*)/4n = (1 — niit)/2n, 



50 


PAVL It. AUftF.llHON 


Ml - £(x) = (m + M,);8a* - 0. 

3/4 = E ( i A ) = {mi + mi + 3 «(mii + ~ -*• Ifin\ 

ft = AfJ/Afi -0, ft - 3/4 .V’, - 3 - J<* - a: n'* - 2I s . 


From these constants, we see that the variance of the sUnhPrd means k 
(1 — 2jir)/2n, hut the variance of random moms of 2 « rioru m* 1 * 1 , 2 n &» k 
well-known. Thus it is obvious that the scatter of the stratified me«m n, Jew 
than the scatter of the random means. Furthermore, the stnkihof means aw* 
distributed symmetrically since il/ a •* t). Ulwrvhig ft, w«* mu 0 * that, the 
distribution of the stratified means is slightly more jwaked than normal. Since 
it is well known that random means from a normal population ntr normally 
distributed, the differences hetwren the two distributions mt* eaey t*» As 
n -*■ «s, ft -> 3 , so it is reasonably likely that the at ratified meant tend to l* 
normally distributed as the size of the sample increases. 

If we select a random sample (x, tj) of two elements from the normal pupu- 
lation F, then the variance («S*) will be*: 

S 9 = k(x + y) -(x + p)7d •» (x - y)\’4. 

The method of expectations gives us the following values: 

Ml =i fj, Ml = 1, 3/, «3 , 

ft- 8 , ft. 16, 

Therefore, the skewness of this distribution ns measured by Kldcrton's formula 
is 1.414. For a stratified sample, where we select x from F, and y from F *, 
the second, third, and fourth central moments of A * 9 are: 

Mi = (/ + 2 w + 2)/2 *\ 


Mi = (4r* + 7*’ - 12r + 8)/4*\ 

Mi = (15* 4 + 30t - 40*’ + 24* - I2)/-k 4 . 

^follows easily that ft = 4.71084, ft * 10.28489, * » 10.4, 2 ft - 3 ft - q » 
.438S24, S 4 - 1,02. For samples of two elements, the stratified mmplm yield 
a drstnbution for the variance which is Ion skewed than the cwmmonding 
distribution of the vanances for random samples. The variances for random 
samples of two elements are distributed as a Type, III curve, while the variance 
for stratified samples of two elements is either a Type HI or a Type VI curve. 

b Vi® rand ° m Ca8e and the 8tralifie(1 w seen from 
this point of view is not clear cut. 

thf-anSwrn^V 0 Se ; ^ h f, aort ° f bi “ is 'Produced by taking n elements of 
the sample from F 1 and by taking 2n elements of the sample from F 3 . Under 

these circumstances, the complete sample will contain 3 n elements, and the mean 



STRATIFIED SAMPLING 


51 


of the sample will b. x « 22 x,/3n a fy, + 2fj)/3. As before, the central 

moments and the *fs an* found to Ik;: 

A/s iun 4* 2p»)/fln fj n /3n, A/j = (A, + 2 jiji)/27u 5 — /u s /9n 5 , 

AD » (»I«| - V„ + »HM«]/27n*, 

ft '-■> . ft “ fm/3n/i« — 1/n + 3. 

Wi* notice firnt that tin* means arc not. symmetrically distributed for small values 
of n since* ft \ 0, hut as n —* ■■*- , ft 0, so the means tend to be symmetrically 
distributed. It is evident also that ft -+ 3 with increasing n\ consequently, 
iht* bias which is present for small values of n tends to disappear as p increases. 
Incorrect pnqmrlioning of the sines of the partial samples in stratified sampling 
introduces* an erntr into the results whose magnitude decreases with an in¬ 
i’n*aM' in n. 


11. Sampling from a population y ~ 4>(x). Suppose we have a well-behaved 
frequency function $(x) of which the first four moments are finite. Further¬ 
more, it will Ik* required that <#>(x) he continuous arid IUemann-integrable. 
Divide the total x-axis into K parts U h with the separating points 

m . m. * * ’ * «* i in such manner that / dx * • • • = I ^>(x) dx * \/K. 

J -9Ct J 

In this section, we extend some of the definitions of the lost section; g, t will he 
the ith moment about the mean of the tth part h , and ?n,< will lie the ith moment 
almut the origin of the tth part. I,. Take a sample of Kn elements from this 
population at) that n elements are drawn from each part. The mean of this 

Kn X 

sample will Is* £ «■ ^Zxt/Kn. We write this as :E = 22 £i/K, where 

i«l t»»l 

W i 

X, 22 x,/n. It follows easily then that; 

«i - m f I 


A/a « 22 Mi =* 22 jW-K’n 1 . 

t*r*l 


jU 4 


*«1 

'£*.+3»(e*V-»£ia1A , »'. 

.I**! \i**l / ** - l J/ 


as n <-* «*, ft ■"* 0, and ft 3. Therefore, it is evident that-if we divide a 
population into K equal parts and take a sample of Kn elements (n elements 
from each part), the distribution of the means probably tends to normal as the 
number of dements in the sample increases. 


7. Summary. Distributions of means for stratified samples have been ob¬ 
tained for the rectangular population which is symmetric and also for a triangular 
population which can be considered an example of a ./-shaped population. Fee- - - 

. ’ . .. *• *rf- r o ! * % 1 flR 



52 

HVh H. AKUKBMW 

both populations ju., . 

ability than the' Int ' aiw faint'd from -4ndsM dUw vari- 

tained from the raiulcmi Mi,r3 i ,i,,K T}i r ir.» fUf( *4*- 

-1 . Ijtm.ii. lL.«<u«i fhftjj vi*. th*» ratidhutt 


tamed from the ran(lom ^ Tfl '' 

sample means nht n ^ i H, inila(itin exhibit kw frkfl* JW» 

The effect of s S ^ . lhe ran " i M T 1,latir ‘ R 
deviations is t 0 m , mi , in wmtfm «r»*« to r> »ii>tS*«f *b»* '»U(?(dta , 4 
three pomilnjm,,.- a P tl( ‘ distribution ?wn* **yrum< Ur Tim ta turn for the 

For stratified 

much more stabl n . . m the twtanphir Simbid’a rattn r 

Thus it is evidcnM^ *** ^' r mnd,,m fc3n! idc> of dm wot*" nt* 

samples of a natu * ^ Stilled jtampW i»*mw ntfc't&fttwv- <mt raudnm 

where it is pnw r ( , stratified sample worthv of n»»' m n w ar<L uurk 

gesting the problem r ° r ' s Refill Proffer A, It f ra'lmr for *ug. 
11 0 this paper and guiding it h> it,* rtmtlirru* 

rkfkrkniw 

having any ]^ v ( l ll0|| T distribution «f !V niranauf U i% a jf'jAdaljoN 

f’earson’g Xyn tr i/ rP !l Wl ' nt ’- v ^ J, h f ,ni,r fll*iltv( Hi#. mth «jWfnl »f Jr|. l.rr pt 

[21 PiuupIi AUt "ThedL 1 V V»l » J.|. sr# &« 

in which the va • lR,m nipatw for sample of ntr ,V «ir>w. 1 st m i 

probable." ».• , ^"8 value# Iwtumi (I and I,nil tnth vabr* Wing rwiflllv 
|3] Paul It, R rDEHi „ 0 ^ JMnJra, V«|, 21 fisr/?}, pp 2io 241 

8ra all samples f * ' ''^bulipn of the ratio of Ur rwnn i*» a'amdard rb'Vjmim}« 
f24-143. r ° m non-normal «iiivcr«ra„" VmJ 21 .PCtif, pp 

W J. Neymav » 0n . 

q, the t^o rijfr 

• „ * 0c - Jour , Vni (w UII5 <'rent of dm mpmwjythw wa-tb-d;' H»n, Uhl 

[SI A. E. R. c nut , ca ;,^' w dauj. 

rm a a pp. 321 „ 0 1C nn^nsniulsquamlftlariilnr'i litnoaUfr,?," i;:< ? t, r .] h 

m a a. ToKotw^;^- ; 

.hi T c la tributlona 11 »■ e ’naUiematirai cxpretainw of the twtttriiln of fjvgtt»«ry 
[ J ,JAGK "Xhc d"?-^ 0 ' VoL 12 ,191K 19) ' PP w 1 ®. ]h ’> 8 W 

isi w ti rotn !10n mormai ^/^ntion of Student’s ratio for aattiplui of iw« su-w* dsrwi 
• Elde Rton n " 1V( -rat's, 11 zltinai* o/ Math. $M , Vol 10 /JWHJ, pp M M 
a yton, 1927 ffi cn v ^ nf, y Curm and CitwkU'tm, London CSmrlrfs and 
Wcond Edition), pp. 239. 



cntUTION OF A MATHEMATICAL PROBLEM CONNECTED WITH 
THE THEORY OF HEREDITY 1 

By H. Bf.rkhtk.in 


Thanhi.atm) »y Emma Lkhnkr 
f Vuji rraiin "f Cnlifurniti, Berkeley 

TmmlnUrt!. .Yob- Although a French remand of the article, here translated 
,, mXi ,1 ,n f'wmnlr* Rmlu*; it i« so condensed due to space restrictions that in 
reporting "it Bernstein's work for the Statistical Seminar at the University of 
( Viiforiiia, U ireatw* necessary U» refer to the original Russian paper. Because 
rif u,e obvious language difficulty together with the extreme rareness of the 
Ukrainian publication in this country, mtd teoaiwe of the eunent interest in 
0,,. iinplirntion of stlitl-tind theories to genetics, » seenied advisable to make. 
M,m in!i*.rl«.» »r««ele available to a much larger class of leaders. _ 

H m ri-en tt* d that, dm* to the present conditions, it was impracticable to 
obtain the author's comment* on this translation, and it is hoped that the slight 
changes and additions inserted, wmu ' uf thc Wore dlffi(!ult 

would haw met with his approval. 

i I us consider S classes of individuals which possess the property that 
’ ' ( ...... u r ,]„««. individuals produces an individual belonging to 

:,T Zi2 «>will™n-t,art <**«..biotypv.” 
... , rl i v t i 1(1 « tin* probability of obtaining an individual of class j 

" e will suppose'-»> .;;;; i ^ lllls of ^ » and * has some definite value 

,« a n»u "'“ * „ , hm , |, r .il,aliiliti.C "lu-mlity mMnte of a given 

ity,;."'Uli** from tin- .MinUi™ -t - *“' 1 

,v 

(i) S At,k==1 ‘ 

U.t « t«* the probability that an individual Imlongs to class j, then under pan¬ 
mixia 1 the probability of Mown* to class j m the next general,on u given by 


(2) 




^ Aik <*(«*. 


, , the AnmWt Xcknlifiqua tie I'Vkraine, Vol. 1 (1924), 

»The uriRUitd was publisneo m 

p Kt’lM. _ jmk.KH BH 1 584. 

3 ('. It. Ac. tic., \ u . ' P ’’ T'jjjversUy Mathematical Library (or their loan of this 

a Thanhs art? due to the brown imivnro J 

the relative probability that an offspring belong to class j, given 

that the parents belong to clauses**' 
i That is, complete absence of selection. 

6o 



54 


H. BERMSTKIJf 


Similarly we have for (he next generation 

( 3 ) a" =■* 23 A[tn', «* 

«,k 

and so on. 

The problem which we now* propose i« as follows’ 

For what heredity coefficients under panmixia mil (hr disirdivlvw «</ pmfiaAjhfira 
achieved in the second generation remain unulUrr*! m all sut>rrgurni yrwritfwnii? 

If the heredity coefficients satisfy this rendition, thou lh»> law >4 firmb'y 
which corresponds to them is called "stable,'’ 

2. We prove first of all that (he Mendclian law in *f aide. The Mendelum law 
has to do with three classes of individuals, the first two of which im pure rich, 
while the third is a hybrid rare such that the cross of an individual of jfip first 
class with an individual of the second class always, produces an individual of the 
third class. It follows therefore that 

A| t A\% » A is “ I, while 

Aii *> Am *» An * A|j «* Al s « A I* « 0. 

The remaining 9 coefficients are defined as follows: 

All — A»s =» A?j ** Ass 181 A|i =» 1 /2 

Am *® Am m 1 / 4 , while A*j * A» » 0. 

If, for simplicity, we denote the probabilities of belonging Ut the firat, wound 
or third class by a, /3, y, then (2) becomes 

(4) a' = (a + Jy) 5 d' = (/9 -f Jy)® y' a 2(a -f bHU f $y), 
while on iteration we get the equivalent of (3), namely 

a" = [(a + iy) 3 + (a + b)(P 4* $y)l* “ (« + b) ! <« + 0 + yf 

(5) d" = [(d + fr) 1 + (a + b){P + fry))’ * 09 + ir)V +■ (i + yj 3 

y" = 2[(a + b) 2 + (a + b)W + iy)llG3 + bf + (a •+* iy)W + b)J 
~ 2(a -f b){P + $y)(« + J3 + y)*. 

Since a + d + y = 1, it follows that a" = a', B" ™ y" » y' « n d henw* 
the Mendelian law is stable. 

3. The first rather important result can be stated as follows: 

Theorem: If three classes form a closed biolype under a stable heredity law, 
which is such that the cross of an individual of the first class with an individual of 
the second class always produces an individual of the third class, than the first two 
classes represent pure races and the law of heredity is the Mendelian law. 



MATHEMATICAL problem of heredity 


55 


If tlu* original probabilities are a, 0 , 7 , then the corresponding probabilities 
for flic next generation ran lx- written as follows: 

ro -1 s;o 1 2.-1 \~aii -p ,{-b 2/1 K ery *)■ 2,4 nBy + / 1 *jY 5 = /(a, 0, 7 ), 

<fi) li, *-* Unu c 2/fiz«d 4- /Ik ;/ 5 Hh 2litarty T 2B^t0y + Buy 1 — ^(a, /3, 7 ), 

Ti -’■* <"««’ t- W\i*0 -fi Crtf i 2 (W + 2('n0y + CW - *(«, 0, 7 ), 

where ,1 ,* 4 /f,» b f\* 1 . Since Ca “ I, by assumption, it follows that 

Bn e* ,4 11 0, 

since all the coefficients must Ik* positive, or zero. 

The inathematiral problem before us consists in determining three homoge¬ 
nous quadratic forms /, v>, ami V* in a, 0, 7 with non-negative coefficients such 
that 

/ + If! + ~ 1 — (« + 0 + 7 )’ 

and satisfying the conditions of stability, namely 

/(on < di , 7i) 121 /(«, 0, 7 ) = <* 1 , 

(7) v(«i . 0i , 7t) v=(«. 0, 7 ) “ 0i, 

^f«i , 01 , 7t) 83 £(«> 0, y) *=71, 

for all a, 0, 7 Much that* « +• 0 -f* 7 «=» 1. The third equation is, of course, a 
consequence of the first two. 

The functions /, v> and Vb being continuous, assume infinitely many values, 
unless they are constants, in which ease they may be expressed as quadratic 
forms by 

/ oh p(a + 0 f 7) 2 , <fi Ba <j(a + 0 + 7)*> ^ “ r(a -f- 0 •+* 7)* 

where p, q , r art* constants. But, since the coefficient of a0 is zero in / and <p, 
and 1 in i, this mluces to the trivial ease / » 0, <p =* 0 and ^ == 1, which we 
can neglect. 

We now write (7) in the form 



«i * /(«, 0i y) 

“* ot(a 

+ 

0 

Hh 

7 ) + 

Bi(a, 0, 7 ) 


(8) 

01 “ V»(ot, 0, 7) 

™ 0(a 

+ 

0 

+ 

7 ) + 

Woe, 19, 7 ) 



71 “ fU*. P, y) 

» 7 (a 

+ 

0 

+ 

7 ) - 

W«,fcY) - 

ft(«! 0, y)i 

Since 









(9) 

)(a, 0, 7 ) m «(« + 0 

+ 

y) 

» 

V>(o 

0, 7) “ £(<* 

+ 0 + y), 


0t y) ®* 7 (« + 3 + 7 ), 


4 The fact that the variables a, 0, y are not independent does not preclude tbe validity 
of identifying their coefficients in the equations that follow, einoe all these equations are 
homogeneous, 



56 


K. BEIlNrtTEI.V 


are obviously solutions of (7), it follows that FJf. y. f > ~ 0 and /V/, y. ■- 0. 
But, as wo have just seen, /, <p and tp assume infinitely maiiv value-*. Therefore 
Fi and F 2 either have a linear factor in rummnn. or else an* pmjxtrrimuil and 
irreducible, 7 

We first show that Fi and F\ do not have only a linear factor, la 4 «id (- u>, 
in common, for if they did this factor would vanidi for a r /, • y, t - p 

so that 

(10) ?/(«, di 7 ) +• d, 7 ) T npia, d, 7 ) <1. 

But since neither / nor <p have* a term in a/j, while $ has, « (>. Also, since 

f and <p have no negative coefficients, l and m are of opposite signs. 1 ,<-t l (1, 
while m — —p, where p =£ 0. Then the third equation uK) can la* wrifri’U 

(11) tp(a, /9, 7 ) = 7(« + d 4- 7 ) + f4« 4- /Id 4 f'yUla — ]KiU 

The coefficients of a 2 and il % in <p must be, non-negative. Therefore if follows 
that A > 0, while B < (]. Rut the coefficient of ad in p is 2, while IB Aft 
cannot be positive. Therefore [<\ and F j have no linear factor in common, and 
m.ust be proportional. But since, the coefficient of ad in / and y is zero, the 
coefficient of ad in both F, and F s must Ik* -1, and therefore f, nod F, are 
equal and we can write F t = Fi - F, and (8) Incomes: 

/(«> d, 7 ) = a(a 4- d 4- 7 ) 4- Fla, ,i. 7 ) 

(12) ¥>(«, d, 7 ) == d(« 4- d + y) 4* /'’(a, d, 7 ) 

iK«, d, y) - 7(« 4- d 4- 7 ) - 2F(«, d, 7 ), 

where F is an irreducible, homogeneous, quadratic form in ft, d, 7 . Further¬ 
more, the coefficient of of in F must be zero, since were it positive, file coefficient 
of <x in f would exceed 1, and were it negative, the coefficient of «* in ^ would 
also be negative, which is impossible. Similarly, the coefficient of iT in F is 
also zero We can therefore write F in the form 

(13) F(ft, d, 7 ) = —ad 4- cop* 4- rid 7 + £ 7 *. 

Moreover, we know that 

(14) F(a', d', 7 ') = F(«S + F, d/i + F, 7 *S’ - 2F) - 0, 

foi all values of a, di 7> such that « 4* d 4" 7 ^ *S’ «• 1, Kxjnuiditig (14) in 
.Taylor series about the point (a,3, d*S, 7*3) in three space, we got only three terms 
in the expansion, since all the derivatives of order greater than the second are 
identically zero, and the constant term can be obtained very simply hv pulling 
a - d — 7 = 0 in F(aS 4- F, (3S 4- F, yS — 2F). In this way we have 

7 See BCcher, Introduction to Higher Algebra, p. 210 , Theorem 3 . 



MVnU,M.VU< U. I'HOHLKU OK HKHfctJlTY 


57 


(151 FfaS + F. fiS + F. yS - 2F) 

~~ FinS, tiS, yS) 

4- F{F a iaS. 0S, yS\ + F' e (aS, fj.S', 7 .S') — 2F^(aS, 0ti,yti)] 
4 - Ft F, F, - 2 F) r- 0 . 

Since 7’ if* » homt*g«*nw»Uft form of the second degree 

(10) FUtS, as, yS) .VF<«. d. 7 ); F(F, F, -2F) « # 4 f(l, 1, -2), 

while it« derivative* with respect to «, 0 , 7 are homogeneous linear forma so that 

(17) FjnS, 8S, yS) -t 1 ST Jw, /t, 7 ). 

Substituting thw in (15) and dividing out an F(«, 0, 7 ), which is not identically 
zem, we get 

(IS) .V* 4- a, 7 ) 4 F’ & in, 0, 7 ) ~ 2F U«. 3. 7)1 

- ~F{ a ,a,y)Fa, i, “ 2 ). 

But since Ff«, d, 7 ) i« irreducible, F(l, 1, —2) must be zero, Dividing hy »S* 
we finally get 

(IB) « « 2 Fj - A'L - 7*; 

or (« + 0 + 7 ) » 2(rtr + titi + 2>y) - (~0 + cy) - (-a + dy) from which 
it follows that 

C at 0 at 0, f si 1 / 4 , 

and hence 

(20) Fia, 0, 7 ) » 7 3 /4 ~ *fi- 

Therefore we have found that 

/(a, 0 , 7) « <*(<* 4* 0 4* 7) 4- 7*/4 - 0 ^ = a + trr + 7*/4 » (a -f §7)*, 

( 21 ) ¥»(«, 0, 7 ) «» 0 (a + 0 + 7) 4 7*/4 -00 - 0 * + 0y + 7*/4 - 0? + Jr)*. 

^(m, j3, 7 ) ** 7 (a 4* 5 4* 7 ) — ifr 3 + 2o/ 9 “ 2(a 4* 47)(d 4 fo)> 

which is the Mendelian law, 

4. We have therefore shown that the Mendelian law is a necessary conse¬ 
quence of any stable law, which is such that the cross of the first two classes 
produces the third hybrid class. We have not even assumed a priori that the 
first two classes are pure races. From a theoretical point of view it is interesting 
to investigate the possibility of crossing two pure races under different laws of 
heredity, which are nevertheless stable. 

We will therefore suppose to start with that the coefficients of a in/(a, 0, 7 ) 



58 


R. MERN'RTEIN 


and of /3 2 in <p{a, j5, y) are equal to unity. Beginning, with equations Oi) of the 
previous section, which merely express the condition that the heredity law under 
consideration be stable we can write, 

(22) Fi = F = - aaB -f cay + dfiy d n- 

As before, F i and Ft cannot have a linear factor iri rornmon, hence tliey un¬ 
proportional, and we can write Ft ~ XF, Fi » F. We therefore have five 
coefficients to determine: a, c, d, e, and X, Since 

(23) F(aS + F, PS + XF, yS - (X + 1 )F) ** 0 
we have an analogue of (19) 

(24) s = (i + x)f; - fi - xf; 

or 

« d- P + 7 = (1 + X)(ca + d- 2 <rr) d~ ap — cy + Xn« — Xd-y. 

Equating coefficients of a, p, y as la-fore wo have 
1 = c(l + X) +■ aX or c = (1 - Xa)/(1 + X), 

1 = d(l + X) + a or d - (1 — a)/(l + X), 

1 = 2c(l + X) - c - Xtf * 2e(l + X) - (1 - Xa)/(1 + X) - (1 - a)/(I + X), 

or 

e = (1 + X - aX)/(l + X)’. 

Therefore the most general quadratic form F satisfying our conditions can lie 
written 

(2Q) F{a, fi, y) = -aa(l + ~~ «T + Py + ^ V - 

If we let a\ = b , this becomes 

(27) F(a, fi, y) = —aafi + ~o«t + l ~apy + 

Substituting this value of F into /, <p, y£- and simplifying, wo got 

(28) /(«, fi, y) = (a + ~ 7 ) [« + U ~ a)fi + (l - ~g) 7 ], 

* (a -" (0 + rb 7 )[ (1 “ b)a + * + ( x " m) Y l 

p, 7) = (a + 5 ) (a + ^7) (* + 7), 



MvrmvMvTtfAt, manual of hkukdity 


50 


where in nrd<-r that all the roeffirienta Is* positive it in necessary and sufficient 
that 0 < « < 1, and 0 < } < < 1. In raw a <-> h = I formulas (28) coinride 
with >21) and wi* ({ft the Mcndilhui law. 

The qucdion of whether three actually exist heredity laws whirl) satisfy (28) 
with n 1, and fc *>' I ran only !«■ solved experimentally. Theoretically for¬ 
mulas i2&( give the m<««t general heredity law of a closet! bintype consisting of 
three claws**, with the rendition (hat two of the three elasses la* pure raees. It 
in easy to see that the only law of heredity in which sill three elapses are pure 
race* is given by the particular solution of (8) 

(291 / •-* o ,f « + ft -f >), *» * d(» -t & T y), *“• r(« + 0 + y), 

in which Ft “» /■*. u ft, 

6. Hupfxwing as before that the heredity law is stable, it remains to prove 
the following theorem to exhaust all possible biotyjics consisting of only three 
classes. 

Thbohkm : If all rltuutc * arc hybrid\ Ihm 

(30) / *» put i 8 + yf, rf> ut qf a -f. fl + yf, f » r(a + 0 Hh yf. 

If only one of the rimttr* rrpremtfs a pure race, (hen either 

/ ik tor + + h)(o + 8) + (1 — d)y) 

(31) «* (a + #)($(! “ b)(a -f d) + <( 7 ) 

if eh y(a + 0 Hb y) 

or 

(32) / « txH + aa(fi& + y) and iut> + f ^ 0. 

We have seen that if/, v>, and f are functions of (« + P + y), then we arrive 
at (30), in the contrary caw* we arrive at (8). Hero we, distinguish two cases; 
I) F\ and Ft are irreducible quadratic forms winch are proportional: Fi » hF, 
Ft * k»F, and 2) Fi and Ft have a common factor, which is a linear form. Sup¬ 
pose at first that F is a quadratic form. If none of the numbers kt , kt , and 
ki + tkb Km, then two of them may t» token os positive, say h and kt . But 
then the coefficients of « s and tf in F would have to vanish in order that f have 
no negative coefficionta. But tins case of two pure races has already been 
discussed, and leads to formulas (28). We must therefore suppose next that 
one of the numbers k \, kt, or kt ■+• ki i« zero. Suppose that kt + ki =* 0, that 
is, that the third class is a pure race, and hence the coefficient of y in f is unity. 
Therefore, the coefficient of 7 * in F must be zero. We can take k ** 1, then 
h *» —I, and therefore the coefficient ay in F is negative, say —d. We can 
now write 



60 


B. BERNSTEIN 


(33) F = aa •+■ bcci 9 + c(? ~ day 4~ efty. 

We have as before 

(34) F(aS 4- F, PS - F, yS) * 0, 
from which we derive by Taylor's expansion 

(35) S « F'» - F'„ 
or in other words 

o4-/9+Y = ha + 2c/J 4~ ey — b& + t/y — 2not, 

which leads to 

(36) F = %(b — l)tr 2 + bap + ^(?; 4* l)/? 2 ~ day -j- (1 — d)dy 
and hence to / and y>, which are as follows, 

(37) / = (« + j8)ft(l + W« + « 4- (l - <0 y). 

<P — (<* + «»(1 — 6)(a 4- 0) 4- rfy]. 


It now remains to suppose that F is a linear form. lx«t 

(38) F *= Xa 4* nd 4" 7- 

Here the condition that the heredity law he stable leads as before to the equation 

(39) S = (ft 4- fcr)F; - *F' a - * (/c 4- fc,) - XJfc - gjfc,, 

where k and ki are linear forms 


(40) 


k — aa + bp -f- cy, fti = aia 4* b t f) 4* t hy• 


Hence if we had no restrictions on signs and magnitudes we could select k 
arbitrarily, and then we would have h = [5 4* (X - l)fc)/[l - m], and the 
solution for /, <p, \p would depend on five parameters, (X, p, a, b, c). 

But since in / = ocS 4- kF, the coefficients of ft, py and y i are non-negative 
$ > 0 , and 6 4 nc > 0, c > 0, and similarly from the same property of <p we 
have Xm > 0, ci > 0, a t 4- Xci > 0. But m and X cannot both be non-negative, 
for then X/ + 4-^ = 0 would be impossible. 

Let m < 0, then b — c = 0, but then the coefficient of « a in / would be 1 4 * oX, 
which will be too big, unless X « 0, Hence, F == up 4- 7 , fc « aa, and 

/ = otS 4~ a<x{nP 4“ 7 ), 


f = - w = 4- 7 ) - aa(up 4 - 7 )]/[m - 1]. 

Hence we have exhausted all possible cases and have proved our theorem. 



MATHEMATICAL PROBLEM OF HEREDITY 


01 


6, Wf cun surnmturm* our results aw follows. The heredity laws of a dosed 
biotvpe of three dawre which are stable ran lx- divided into the following types: 

1. Two rlannea reprorent pure raws. The heredity laws are given by (28), 
and m particular for the Menddiati rase by (21). 

2. There are no pure ram*, and every rare ran lx* obtained by crossing the 
other ram*. The heredity law in given by (30). 

3. All three dam* are pure ram. The heredity law is given by (29). Any 
two clww of this biotype, also form a dosed biotype. 

4. One of the d«*w is ft pure rare. The heredity laws are given by (31) 
And (32). 



SOME RECENT ADVANCES IN MATHEMATICAL STATISTICS, I 

By Burton H. Camp 1 
Wesleyan Unirtrally 

The papers considered in this partial review are listed at the end. Fur the 
most part they have appeared within the lost five years, hut in order to explain 
what has been done within the last five years it has been necessary* occasionally 
to use material that appeared earlier. The subject matter is divided into 
four parts. 


Part I. The Theory of Tests. Since an attempt is being made to present 
the material of this paper in such a form that it may bo read rapidly hy those 
who have not read the underlying literature, the author will endeavor to do little 
more, in Part I, than to define and illustrate several terms which art* lieing urns! 
Altogether there are nine of these terms. It is fortunate that their meanings 
can be explained pretty well by reference to an extremely simple picture. Let 
each of the curves in the figure indicate a probability distribution p(x; S), in 
which there is a single variate x and & single parameter 0. 


Example 1. p(x, 6) 


1 




the normal distribution in which (he 


V2r 6 

center is at a: = 0, and the standard deviation is unity. 

Let a random sample E be drawn from a population indicated hy such a 
curve. In the simplest case E «= x, a single individual. Shortly, we shall have 
to suppose that there are N individuals: B « ii, • • * , x*. Eventually, the 



picture will be generalized much further, The population will be described by a 
function of n variables, so that, in place of each x of our sample, we shall have 

1 One of two papers read by Cecil C. Craig and by the author at a joint meeting of the 
Institute of Mathematical Statistics, the Econometric Society and the American Statistical 
Association, held in New York City on December 30,1941. 

62 



MAIm M VrifAI. hrATI.sTirS 


03 


x' , ■ ■ ; i nun over there will U*, not nut* parameter, but l parameters 

0 , • • ■ , 0 i “ r ' that mir probability distribution Mill be multivariate and will 
be denoted bv 

P s 1 . ■■■ , ; B O' 1 '). 

A common way of putting this i* to wiv that x and 0 are vectors in n and l 
dimensions re-jierlively, and to leave the form ns migmally, p\x\ 1 9). In the 
figure the spare which the wimples >E x) can occupy in of course nut inure 
than the x-axis. but in the most genera! ease the sample space will lie a part nr 
all of a spare of mV dimensions arid will tie denoted by IF As is well under¬ 
stood, a significance test is an inequality which specifics m 11* a certain region 
if as* a critical region, and if E is in this n\ the hypothesis Iming petted is rejected. 
For example, in the figure, one might test the hyjHithesis //„ that fl ~ . The 

rejection region w» might la* the part of the x-axis when* .r > x-< , In all mioli 
cases we shall let « equal the probability that E is in ay if 0 a* 0„, This state- 
rrsent will (w* denoted m follows: 

(1) « - /’(«vl <h), 

P standing for prolmbility. 

(t) Power of a Uttl. A good test should satisfy two conditions: fa) if our 
sample is drawn from the population «*|H*rified by 0d , the liyjKithcsiK //„ that 
0 -• A. should lx* accepted ns often as p»wtsiblc, and ( b) if our sample is drawn 
from a jxipulnUon specified by some other value of 0, say 0| , then the hypothesis 
that 0 -- fit should also lx 1 accepted as often aa jHissihle. Suppose first that 
tltere art* but these two admissible populations. The pro Viability of (a) is V — a. 
We commonly make the artificial requirement that this shall Ik- some larger 
fraction such an 0.0ft. The probability of (6) is commonly denoted by d, and 
in the figure, when w» * ten, 0 is the area under the Ih curve which lies to the 
right, of x » x„ . Relative to (k , 0, , and a, the quantity d is called the power 
of that test which designates ic« as the critical region. Also, a and (1 — ft) 
are the probabilities of the so-called errors of the first and second kinds, 
respectively. 

(it) Unbimed lent. As stated, we would like to haw 0 large. In any caw 
we would like to have 0 if. a. If d §1 or, the test and the corresponding region 
m arc "unbiased" (relative to the preassigned quantities 0», Qi, and a). The 
region tea appears to Vs* unbiased in our figure. This definition can obviously 
ire extended to the ease where, in addition to 0 \, there is an infinity of admissible 
values of 0; then the test is unbiased relative to the whole family of admissible 
values of 0 if, for every one of these 0's, d '& «. 

(m) UMP leal and CBC region. If, with respect to a family of admissible 
0’s, a critical region u»o exists such that, for each of these 0’s (?* Oo), 0 is greater 
than it would be for any other critical region satisfying (1), then this Wt is said 
to be the common best critical (CBC) region and the corresponding teat is the 
uniformly most powerful (TIMP) test. 



64 


IlUTtTON H. 


(in) UMPU Psl and CBW region. if then* w no CBC rrici»ii. -till it may 
happen that, if one restricts (mrs view to only unbraced region-*. there may fw- 
among them a CBC region. Surli a region in said to I*- a l*->4 m*ie«l 

unbiased (CHOU) region, and tin- corrcfljKindmg tent if* the nmfornd> nio*u 
powerful unbiased (UMPC) test. 

In the following example#, and elsewhere, we shall now n»e // to indicate the 
hypothesis being tested, H* to indicate all admew-ibk- fdternativr- 
Example 2: p(x, 8) normal as in Example 1, K - r, fl- 8 *- S A ,//*,(? > . 

The CBC region is where z > k if 



p(x-, &s)dx 


a. 


This region is the interval indicated by te» in the figure. 

Example 3: Same as the preceding example except that now we have m 
H*: 6 0 Q , There is no CBC region, but the CROC region riind«ta of two tail 
■ntervals, where lx) > k if 


f p{x, 8a) dr «• |er, 
Jk 


A little reflection will convince the. reader that the statement* m thw two 
examples are at least apparently true. It is geometrically evident, for example, 
that the last mentioned region (two tail intervals) is not as powerful with respect 
to the alternatives of Example 1 (fl > 8a) as is the single tail region gt, ju the 
figure. 

(u) Type A region a. It is often difficult to find even a CBCC region, or flitch 
a region may not exist, but it may be that there is a region which ha* the required 
properties if one admits only values of S near to the value ft, being tested. Type 
A regions have this property. More precisely, they have the projwrty that 
the power of tco is a minimum at 8 a with respect to small changes in fl, and that 
this is a sharper minimum at Be than is the power of any other ms which satisfies 
equation (1), Here the words "small changes" arc used as in the calculus. 
The full definition [4] of an unbiased region of type A is that it shall satisfy (I) 
and also the following conditions: 

(2) 8 shall be a single parameter (not a vector), 

(3) ^ P(Wo | 8) — 0 if 0 *= 0o, 

d* d, 1 

^ d& P( ' Wo I - dip p ( w 1 s ) whcn 0 m fa for all regions to which satisfy 


the preceding conditions imposed on w,. There are also other typo* of regions 

exZf-fi b r t • *’ ^ D ' " re80mble * W- The following 
example illustrates Type A; it is a familiar problem with an unfamiliar solution 


Example 4. p(x; <r) 


-- —i*/2o> j-% 

oV2^ e > E " 


, : it 


cfl; 



if MHr’.f VIU'At. STATISTICS 


65 


n* n * Tte* rnrr region nf ly f «* A >k determined by two tail areas 
jbr!< they are w** > quid tail »rr«.»f nf t)»** ditdribution of Zx?, 
m) Tfstl firaif 'll] Jm) Asymptotically MP test [15]. (wit) 

ArffmptnheaJUy ,W* r tetf Sift}, In tw- cam* the complete definitions are too 
Fnjtfhy Ui l*r repeated Irere, and thr-y rannot 1«* recapitulated briefly. The 
general id**,» is that, if now* nf thr regions of the preceding f.yp>ra exist, still it 
may hr true that thm* are region** which do have approximately the desired 
proprrtaw if H *- x t , * * , x» , and A* is large. The following example [11] 
iUuatrate* iti). 

i 1 

Example fn ptx; g) m ^ + (£ _ # ” *i, *•• ,x#j H „: fl « 0; 

//**. a* 0. Region** of Type A unlaaard in the limit are defined by the in¬ 
equality, 



t 

(I + z]f 


2E 


1 

1 + a-] 


*f* *t 




5W 

3 


N 

2 


Hen* Af is « quantify that ha» to lx* approximated and tabulated. The in¬ 
equality in not simple, but it furni»h**» a definite answer to the problem. 

fix) Hcgvms mmAnr to sample trpace, AH the preceding definitions apply to 
tin* raw* where x is a vector in n sp.ve, hut not all to the case where 0 is a vector 
in l space. Hitpjxwr now that thi« in the riwte, or, an we have said before, that 
them art* l different parameter*. 8 ", ■ * > , each being capable of taking on a 

variety of value*. riupjMwo we fix nur attention ori B' u and wish to test the 
hypothiw that b ,J ** First of all we wish to find a critical region w# for 
which an equation like (1) will Is* true, independently of what the values of the 
other parameters tmy la*. Such a region is said to be "similar 1 " to sample 
space; the "similarity" consists in the fact that the equation like (1) would be 
true independently of the other parameters, if w» were replaced by all of sample 
space W, and if « *» I. Feller (10} has shown that there are simple eases in 
which there in no region similar to sample space. He and others have investi¬ 
gated the conditions under which such regions do exist. "Generally speaking 
it seems that for most of the probability laws p(x, B ( '\ * * • , in which the 
composite probability law for sample space is made up by multiplication, 


( 2 ) 




0 w > 


there do exist such similar regions, at least if N > U" 


Part H, Estimation, (i) Estimation fry internal. So far we have been con¬ 
sidering possible answers to the question: Shall specified values of 8 m , * * * ,8 
be accepted? The totality of values of the &' a which are so acceptable might 
be called the acceptable point set in parameter (Wimensional) space, This 
point set is determined by the sample or experiment (E), and usually different 
point sets are determined by different E's. Frequently this Bet of points consti- 



66 


BURTON H. CAMP 


tutes a simple closed region, or, in the case of only one parameter, if may Ite a 
single interval. Such an interval is called a fiducial or confidence interval. 
The fundamental property of auch a point set or interval in well known, hut ha« 
to be stated with some care: If a ** 0.01, and if one is alxmt to take a sample 
from a population in which the true values of the parameter's 9 ”, ' ■ *, 9 (f ’ are 
8o l \ * ■ •, 8l'\ then the probability is 0.99 that the sample will such that the 
point set determined by it will contain this true parameter point 9# 1 ', • ••, tCh 
It does not matter whether or not one knows what, there true values of the 
parameters are. If there is more than one parameter, the fiducial interval for 
one of these parameters often does not exist; that is, there is often no such in¬ 
terval which is independent of the values of the other parameters. The question 
whether there is such an interval is obviously connected with the question 
whether there are regions similar to sample space. But if one fiducial interval 
does exist, then usually there are an infinity of them, and our problem is to 
choose the best one. This problem is called ‘'estimation by interval," One 
answer is to choose the shortest interval. More precisely one should ray, the 
shortest system of intervals. One gets a system of intervals try fixing a hut, 
not E. What is desired is a formula which will give the shortest interval for 
every E, but it may well happen that one formula (system) will supply the 
shortest intervals for some E'a, and another will supply the shortest intervals 
for other E’e. The choice between the two systems will then deptmd on the 
relative frequency with which the shortest intervals will be supplied by one 
system or by the other. 

Example 6: p(x; f, <r) is normal, $ indicating the mean and a tlx* standard 
deviation. Given E = xi, • • - , ; to estimate (. The shortest system of 

confidence intervals does not exist (independently of or). 

Example 7. Same as Example 6, except that now one seeks only an upper 
limit to the confidence interval which the parameter must not exceed. Then 
the shortest system (best one-sided estimate) is: £ & £ + te, where Fisher’s t 
and s are meant; t corresponds to a .preaasigned or, and £ is the mean of the 
sample. 

In cases like Example 6, where the shortest system doe® not exist, Neyman 
17] defines a “short unbiased system.” 

Example 8. The short unbiased system for Example 6 ie: i - fa g f g 
2 + ts, (t, s, £) as in Example 7. 

(«) Single estimators, Suppose that, os before, we have a sample (E) and 
wish to choose the best single value for one of the parameters, not as before its 
best fiducial interval. It is well known that there often exist® a fiducial func¬ 
tion 0(9) which, like a probability funotion, is everywhere positive or aero and 
has an integral, 


L ^ de 


h 


and is further useful in determining confidence intervals. In particular, if 9 is a 
location parameter and if the composite probability function is as in (2), with 



JHATHF-MATICAI, STATISTICS 


67 


only nnf* promii^r 9: giS) «■ kpix x ~ 9) ■■ ■ p(x„ - 9), k being a constant. 
An estimate romm-mly thought of an beat in the maximum likelihood estimate: 
this ii» the mode of <?'#*. t Ither estimates that have interesting properties are 
the mean and the median of gi9\. Pitman [14] defines a new “lx‘sfc” esti¬ 
mate Bp . Tim* lias the projxmty that, for every h > 0, 8„ is within h of the, 
true value 9 room often than is any other estimate, More precisely, if 

/»*: 0* e , S h) a P(\ 0, - 9 j £ h) 

for all positive values of h, and if the inequality sign lietween the P’s holds for 
some jK»Mfive value of h, 8i Ixdng even* other estimate, then 0 B is the "beet" 
estimate. As before V stands for probability. 

Example i». If p\x; a} is normal and the sample. E * X[, • ■ • , xk , the 

“best*’ estimate of a is ** ‘ instead of the usual estimates: -• - , -J , 

A — 1 N — 1 N 

Cm"I Wright function. Wald jlB) defines a weight function V(6, Of) which 
rlejM-nd>. on the wriousnesw of the error committed when the estimate 0 K is used 
in place »»f the true value of the parameter 9. The sample E — xi 
and 9 may Is* a vector. Thence lie defines a risk function, 

riB) ® f V-p(xi, ■ • •, x*{9) dW, 

J# 

and the '’hot" f* tv* that value of 0 which minimiws the total risk, 

/ Vp df(9), 

Urn integral being taken over all of the parameter space, and f(0) being the 
a priori distribution of 0. It is undesirable to introduce /(0), but it can be 
shown that, subject to slight restrictions cm the nature of /, one can obtain a 
beat estimate by finding a value 0* which for all O’b makes r equal to a constant 
and also satisfies other general conditions; this equation and these conditions 
do not contain f(6). In a symmetrical but otherwise fairly general case d B is 
the maximum likelihood solution. 

Part HI. Likelihood Tests. This part has to do mostly with special cases 
of likelihood testa. Aa m well known, this test consists in selecting a critical 
rejection region u> in sample space where 

(a) I J (w J fit,) or, 

(b) the relative likelihood of //# is smalt; more precisely, where X < constant, 
and 

max* P(E | w) 

* *"* maxu P(/f j ft) ’ 

w being the region in parameter space specified by the hypothesis tested Ho, 
and SI being the region in parameter space specified by all admissible hypotheses. 
(In special cases max is replaced by least upper bound.) If Ho is simple (w being 
a point) and if the CBC region w exists, then w is bounded by the contour, 



68 


BURTON K. CAW* 


\ =s constant [10). Otherwise this X t<W does not tv •’(•-.■tnrilv ys**i«I *h«* same 
critical regions as do any of the preceding Mj*, Hut j? s* much 

easier to apply, and, in many of the nw* that folium, tie X t* -»•’ <w* g'-ei one* 
as judged by the preceding theory. They are powerful t\m if “l.t «t *■ n<<’ the 
most powerful of all tests, and often this {aimer ran i« found ami WbutoM, 
In fact Wilks [28) has shown that, the appropriate di-tnlettimi "f \ ' omitting 
terms of order 1/s/N) can be found 1 if the distribution of K i« 

n p(^<* s tu » * * * i !ar «**^ 

and 1 if the. optimum estimates • • • , d'' 3 ' exist and arc* distributed n w-pf fur 
certain terms of order 1/y/jty} normally. This theorem ha* now Ion general¬ 
ized by Wald, in a paper presented to the American Matin nintnrd Society in 
December, 1941. 

There are many of these tests, made to fit all sorts of by {withe-*-* The author 
will try to summarize a considerable group of them; all m< nd- r- **f tin- group 
might be called generalizations of the Student-Fisher /-best They fail iiatiindly 
into two classes, according as to whether the individual- of the -ample are taken 
from a univariate or from a multivariate universe. Fob - oilier* i** -Sated alt 
universes shall be normal. i/ 0 shall stand for the hypothec* Imnit tested, and 
H* for all admissible alternatives to //« . 

(i) Univariate case. The sample consists of N elements, n» 1 wfore, 
chosen independently from N normal populations indicated by th**tr {w/nmi'tera 
(iii, ei), '•■,(£«, a N ). About these populations we may ask a vatietv of tjue>» 
tion8 resulting in a variety of problems and tests. 

Problem a; If the populations are all identical ({, el, drs-s | « -perilled 

in advance)? This results in the well-known West. The hy[x>thc*i* u>»m\ IU 
is that £ = fo, and the alternative hypothesis H* is that f t* $, ; it Mug tussuiued 
at the outset that all the populations are identical. The West ha;- lavui shown 
to be an UMPU test relative to II*. 

Problems b,c,d: Let these same samples be arranged in k groujw or “columns" 


where the n, are not necessarily all equal, Let it be assumed that the {mpulg- 
Lons (£, <r) do not change within the columns. Problems (/>), (c) and id), with 
their corresponding tests, may be indicated as follows: 

(b) Are (£, a) con stant from column to column? (The A« » L test.) 

' ^ lat ) lbubon of (-2 log X) is like that of x* except for terms of order 
OD 75 swm 8 COndltlons ' Transaotions of American Mathematical Society, voi. 38 (1934), 



MATMKMA.TIf'.My STATISTICH 


69 


,c'i If. ff ron-tont from column to column regardless of what values the £'s 

may have? *Th>* A-- test.) 

t>l) h i fondant from rnlunm to column assuming the <r’s constant? (The 

x l, tosf,* 

In Problem //-, i*. that /£, tt) are constant, H* that they are not constant, 
hi Problem 'fJ. //.i t*» that rr is constant, II* that it is not constant. In Problem 
M ). //«i" that ( i« constant, II* that it is not constant. The test of Problem (c) 
has recently been »hrmn to Ik* unbiased only if the numbers in all the columns 
are the ‘min* >», ■ * • «*). It is, however, unbiased in the limit. Power 

till dr* were pnbh<dH*d in Hid7 |2Td). Bartlett’s (19117) n is another test for this 
problem, ami Pitman’s !3ti] L test is another, but it has been shown that these 
two tests are equivalent. Both are unbiased; they are, not likelihood tests. 
This problem is frequently called the problem of the “homogeneity” of a set of 
variances, 

All these tests are. of course, functions of the observations, and the details 
an* readily available in the pajiers listed. For example, Pitman’s 

7 '“ i - VU «.V/2-“("' l0 C/2)’ 

when- M, is what he rails the “squartanro” for the tth column, and a large value 
of I, is significant. The sqtiarianre is what the physicists had called and what 
statisticians ought therefore to have called the. second moment, viz.: Njj i; gi is 
really the unit second moment. 

fr) Lint at Hijjntlhnis. Problems like the alxive, and many others, can be 
included in a general theorem by Kolodziejezyk, who showed how to write out 
quite simply the likelihood test if each £ ih a linear function of l parameters 
(l < ,V) and if the hypothtnis //# specifies the values of r different linear functions 
of the a't* (r £ l). Furthermore, the power of this test (with numerous applica¬ 
tions) was discussed and tabulated by Tang in an important paper (39]. 

Problem (/), This method (e) has been used by Neyman’ [43] to test the 
homogeneity of a set of variances, the problem already studied by a number of 
authors. It has been stated that some of their tests were unbiased with respect 
to the alternative hypothesis that the cr's were not all equal. Neyman gives 
reasons for supposing, in the industrial problem he is considering, that it would 
be more realistic to consider another alternative hypothesis, namely, H* that 
the <Fs are not all equal and that their distribution can be approximately 
described by saying that 1// has a x s distribution. No TJMP test exists but 
then* does exist a critical region whose power, with respect to a sub-family of II* 
is independent of tire means, and the corresponding teat is the most powerful 
test for this sub-family of alternatives. Tables of its power are furnished. 
More applications are promised. 

(it) Multivariate case. The sample consists of N elements, exactly as before, 
except that now each a; is a vector in n space and comes from a multivariate 



70 BURTON H* GAMP 

normal universe whose means may be represented again by £ if w think of £ 
aa being a vector in n spare. The other parameter* of this universe an* the 
variances and covariances «„ . So, with these changes, we may rrjwat the 
statement at the beginning of (t) that the sample is Xi. • • • . Ik , and that the 
populations are (f, , a„0, , (£*, a,,„). The questions to Is* a-k-d alxmt 

these populations correspond exactly to those asked in the simpler cm**. 

Problem (a): If the populations are all identical (£, d<« ** £ •» £-1 faperified 

in advance)? The answer is given by Hotelling's T test. Th»* hypothesis 
tested is lh that the vector £ » £ a , and the alternative hypothesis 11* is that 
these two vectors are not identical. P. Hsu 128] has shown that this test is the 
most powerful in a special sense, and has given a new demonstration of tt by the 
use of the Laplace transform. Incidentally he has shown that the Laplace 
transform of an elementary probability law determines the law uniquely except 
perhaps at a null set of points. 

Problems (b), (c), (d ): Now let the same sample Iks arranged in k group* or 
columns, as In (t) b, c, d; and let it be assumed that the jxipulatiutis (£, <«,,! 
do not change within the columns. Problems ((>), (c), and (rf), with their corre¬ 
sponding tests, may be indicated as follows: 

( b) Are (£, aq) constant from column to column? (The bum te-stb 

(c) Are at) constant from column to column regardless of what values the £’« 
may have? (The Xh<„') teat). 

(d) Is the vector £ constant from column to column assuming the «„ constant 
from column to column? (The X« teat). 

Unfortunately, in the customary notation, the X’s for this case («) do not follow 
the pattern adopted in (t). It would be better to put (n) after each of the X'n 
(or L'e) in (t) to signify the corresponding tests in (ft). But, even if this were 
agreed upon, there would still be a confused notation because then* are many 
other “X” and “L" tests besides those listed here. Apparently'* die power func¬ 
tions of these last three multivariate tests have not been found yet. 

(e) The linear hypothesis theory was shown to be applicable to the multi¬ 
variate case in a special instance by P, Hsu in 194(1 [38]. Since then he has 
generalized it further [46], 

{iii) Bivariate case. This important special case of (tt) has now been pretty 
thoroughly solved. A general summary of various teats which have been de¬ 
vised by Finney, Pitman, Morgan, Wilks, and E. S. Pearson was given by 0. 
Hsu in 1940 [42], with some slight additions and with tables of power functions 
with respect to certain alternatives. Altogether there are seven of these teats 
corresponding to seven different problems, including the four juat referred to m 
Problems a, b, c, and d , 

Part IV. The Method of Randomization. This part concerns randomization 
of the individuals within a sample to obtain a method of testing hypotheses 
without making use of any characteristic of the population from, which the 
sample was drawn . It does not deal with randomization in field experi- 

4 So far “ the author iB aware; but he does not pretend to have made a careful ec&rch. 



.MATHEMATICAL STATISTICS 


71 


ntente tn r.fT-ft thr eftecta of variable fertility. Also, in this discussion, the 
hyjxithms W-ing b-sted is not that the sample was a random sample. It is 
nmwntd that the given sample is random. We begin with an example from 
Pitman Two samples, (n , • ■ • , x#} and (tfi, • ■ ■ , y*), have been drawn 

at rami nm from two populations. Tim means of the samples are £ and J?, 
rrspertsvely. I„et \ £ — ff! 1«> railed the spread of these, aamples. Now re¬ 
arrange thm* same x'k and i/'s with earh other in all possible, ways to obtain all 
possible spreads. The larger the observed spread, among all these possible, 
spreads, the more significant it is supposed to be as a test of the (null) hypothesis 
that the two populations were identical. Similarly, testa have been devised for 
correlations, variances, etc. 

E. H. Pearxin J51J in 1038 published a criticism of this general theory which 
in substance seems to la* that the reason why one calls the largest spreads sig¬ 
nificant, rather than the smallest ones, in the. illustration just used, is that one 
is assuming tacitly that the admissible populations are such that large spreads 
.would be more likely on some other than the null hypothesis; that if one does 
not make some such implicit assumption, then one might, quite as well call the 
smallest spreads significant; and that therefore, barring such implicit assump¬ 
tions, one can control only errors of the first kind by this method. 

It seems to the author that Pearson’s criticism is sound, and that, if indeed 
one is unwilling to make any assumption whatever about the populations con¬ 
sidered, then this device is of no* value in testing the null hypothesis. For, if 
all that one pretends to do is to control errors of the first kind, one can do that 
by consulting a table of random numbers of two digits. Thus one can control 
errors of the first kind without performing the experiment at all, let alone 
making the long computations usually required by the method of randomiza¬ 
tion. Or, belter, one can reduce, that error to zero simply by making up one’s 
mind that one will never reject the hypothesis being tested: certainly one will 
never reject it improperly if one never rejects it at all. 

However, if one is willing to make in the illustration used the very mild 
assumption that the populations considered are such that unusually large 
spreads would more probably be obtained from some admissible hypothesis 
other than the null hypothesis, then it would seem to the author that the method 
would be useful. Similar remarks apply to the tests for correlations, vari¬ 
ances, etc, 

REFERENCES 

Pa rt I. Theory or Tj;kt» and Part II. Estimation 
(11 J. Nryaian and E. S. Pearson, “On the problem of the most efficient taste of statistical 
hypotheses/' Phil. Tran*. R. Sac., A., Vol. 231 (1033), pp. 289-337, 

[21 J. Neyman and E. 8. Pearson, “The testing of statistical hypotheses in relation to 
probabilities a priori," Camb. Phil. Sac. Proc., Vol. 29 (1033), pp. 492-510, 

(31 S. 8. Wmts, "Test criteria for statistical hypotheses involving several variables," 
Jour, in. Slat. Amo., Vol. 30 (1935), pp. 549-560. 

(4) J. Net-man and E. 8. Pearson, "Contributions to the theory of testing statistical 

* Pearson’s language is not so strong as this. He says "perhaps it should be described 
aa a valuable device rather than a fundamental principle.' 1 



72 


IU UTON It, r\WI> 


hypotheses! I. t'uln.w il critical ngem*! »t tipe A nr.d S S'.it H<~ 

Man , Hniv rtf I,(union, Voi 1 P'-HiLpp 1 37 
(51 J XeymaN and K >S l’nH‘>r»\. "S.ilhilent HtV'-in-» >-i-l '.mb tr.-h in a* p»air),i) 
tests of statistical hypotheses.'' AVuf Hr* Mur, , V"i S ;•;< IDS,? 

(01 E J. 0 Pitman, “The elnsesf estimates of *Oif iVir.d panne t> t*. i }’h •} ,V, c 

/v«c., Voi 33 noon, pp 212 222 

(7) J. XkYMAN, “Outline of a theorv rtf estimation d oil 1fi< * S r ai 1)« <•*% of pr-dur 

bility,” /’Aft. Trani. R«y Sac , A. Voi 239 pp ■'•** 

(8) J Sill MAN, "Oil Btiltimien I hr* distribution «tf ohirli <r<* m-h |« •<! ** < 3< tr .»>< *r rn 

involved in the prohnlnlily I.-im, of the original tnri.ibb »“ AL; 1 Hi' O' ”, , \<4 

(1938), pp. 58 59 

[9| J. Xeyman and K. H. Plahmlv, ‘'(‘niirriliutnmo to tin- ifamy of aljtis'ipni 

hypotheses,” .SVcj< Hrs Mrm , Vnl, 2 ilWs), pp 25 57 
[10] W. Feller, “Xule oil regions ainiilar to the o;i:ii;>I< rtjrire." , s ‘f»U /»• - 1/4 tv , V-4 2 
(1938), pp 117-125. 

(Ill J. Xeyman, "Tests of Hiitlinlieal hypotheses a limb an uribriH 4 it: "(e )j;r,i‘ J 
of Mnlh Mat , Voi. 9 0938). 09 Mi 

(12) S. S. IYiIjKk, "Fiducial distributions in fiilueisd inh rct.fr,"’ Jtvr.u/ • ’•;* u A Slat . 

Vnl. 9 (1938), pp 272 280. 

(13) A. Wald, "(‘ontrlhulloiia (o (he theory of estimatem and testing In pu* j,< .. “ .11,»,*if* 

r )/Math Rial., Voi, 10 (1939), pp 299 32H 

(14) K, J. (.1 Pitman, “The cufinmllaii of the lor.ilion and ar.dr par,ore 1< i* ,4 i r '.iPi.iUoiii 

population of any given form,” Rwmilrtku. Vnl 30 pp Til i«*f 

[15| A, Wald, "Asymptotically most pimerful teats of efair*'n,d Inpntle »<(,,' Inna!* of 
Mnlh, Slat , Vnl 12 (11111), pp I 19 

Part III Likelihood Tests Hern m. (’in*. 

(16) II. IIotelmnu, “Generalized t-teHt," Annuls a/ Math Rial , Vnl 2 Kilt pp (hit 37s 

(17) S, S. Wilks, "Certain Ronernliaiitimm tit mmlvsis of variance.” lltxn.ii,,ha. Vnl 2i 

(1932), pp. 471-494. 

[181 E> S. Pearson ami 8 K Wilks, “Methods of sialistieal analysts appmpn «*<* for k 
samples of two variates," ItioinHrika, Vnl. 25 11933), pp 353 37ft 
[19] J Neyman and E. S, Pearson, “On the problem of the most ellici* it! it sis *«f st.itiriiical 
hypothcBOB,” /’Ail. Trans Kwj. .S'or,, A., Yol. 231 (1933). pp 289 M7 
• [20] H. L Wklcii, “Some problems in analysis of regression among k samples of nm van 

ables," Hiomelrika , Vnl, 27 (1935)’, pp. 145 100. 

[21] S. S Wilks, “Test criteria for statistical hypotheses involving several vnrultlt h," 

Jour. Am Slat. Aa#u., Voi.30 (1935), pp 5-10 5G0. 

[22] P, P N. Nayer, “An investigation into the application of Xeyrrmn anti Pearson's 

test, with tublcs of percentage limits,” Rial, tics. Mam , Voi 1 rp.Ktt;], l mvereitv 
College, London, pp 38-56. 

[23] S. S. Wilks and C. M, Thompson, "The sampling distribution of the criterion II «lien 

the hypothesis tested ia not true,’’ /ftomcfriAn, Vnl. 29 (1937), pp 121 PW 

[24] D, J, Finney, "The distribution of the ratio of eatimotCH of the two vnrmiteert ,,i n 

Bamplc from a normal bivariate population," Hitmielnka, voi, 30 p'.ts). pp, 

190"192. 

125] D, N. Lawley, “A generalisation or Fisher’s a-lest,’’ [iwmelrika, Voi. (HI (Kth), pp, 

1B0-187* 

126] P, L Ilsu, "Contribution to the theory of 'Student's' Mest aa applied to the problem 

of two samples,’’ Slat. Res. Mem., Voi. 2 (1938), pp. 1-24. 

[27] P. L. Hsu, "On the best unbiased quadratic estimate of the variance," Slat. Iks. Mm ., 
Voi 2 (1938), pp 91-104. 



Maiiikm vrit'Ai, ktatisticm 


73 


j2S, I' I !I-1 . on Hup Ihng's generalized TAnnals uf Math. Slat., Vol. 9 (1938), 

pp sti usi 

(29) H K Wtifc 1 he large 'ample distribution of the likelihood ratio fur touting cwn- 
|v «i*<f lAje>»hr Annals of Mn'.h Slat , Vol 9 (1938), pp. 90 03. 

|3(l) I X SKus* and J Numan, "l.ttwiMfm of tin* Markoff theorem im lenal squares," 
M,f lies Sinn , Vui 2 ■ IHHVt, pp 105- 110. 

131' II J lft*-ii'H'aml 1’ ri Nutt. "A note oil certain methods of testing for the homogeneity 
of a wA of variances,” Jour. Hoy. Mat. Sue,, Supp , Vol. 0 (1939), 

pj, *0 op 

j,T2; (1 W )!«'•«•■ "(hi the prmri of the L\ test for the equality of several variances," 

Mat , Vol 10 f 1939), pp. 119 128 

( 33 ] K J (* 1’itm t8, “A note on normal correlation Hwmctrika, Vol. 31 (1939), pp. 9-12 

(343 W A Motto "A Jest for the significance of the difference between the two variances 
jti a sample fr«m a normal bivariate population," Biomelnka, Vol. 31 (1939). 
pp 13 Ul 

(3SJ It J ItjhttMf, "(In a comprehensive test for the homogeneity of variances and c<>- 
varMtsres in multivariate proldems,'' Uwmrlrika, Vol. 31 (1939), pp. 31-55 

J3*v; h 3 <■' ITumaN. *"i>«ta of hypotheses eoneerning location and scale parameters,” 
/(»«,«,»Vol 31 11939), pp. 2ND 215. 

i:r?3 P (1 ,f<m\e«)\ and .1 N'mm an, "Testa of certain linear hypotheses and their applica¬ 
tion »,( «oine educational prohletns," AVal. fits. Mem., Vol. 1 (1939), pp. 57-93 

j3hj p I, Hm , "On getters)md analysis of variance, 1,” Hiumetrika, Vol. 31 (1910), pp. 
2,’! 23? 

|39] P (* rvN*,, "The power (unrtion of atmlysis of variance tests with tnbleH and illustra- 
limis of their use," Slat lire Mem , Vol 2 (11138), pp. 120-149. 

Jpij {j (i 11 \HtrM', "Testing the homogeneity of a set of variances," Hiumetrika, Vol. 31 
P.MOI. pp 219 2W 

HI* J 1 IHiT, "On the unhinged character of likelihood ratio tests for independence in 
tiotmal systems,” Anna It of Math. Slut,, Vol. 11 (1940), pp. 1-32. 

| 42 j (* -j- n„i , >*tn, samples from a normal bivariate population,” Annals uf Math Slat,, 
Vol 11 ItMO), pp. 410 -I2fi. 

H3] J. NnMAN, "A statistical problem arising in routine analysis and in aumpling inspec¬ 
tions of mw production," Aminfs uf Math. Slat,, Vol. 12 (1911), pp. -I (i- 70. 

H-lj A. W,u,ii and It J. IIhwoKNBR, "On the distribution of Wilks’ statistic for testing the 
independence of several groups of variates," Annals uf Math Slat., Vol. 12 
11941), pp. 137 152. 

(45| r. I- Her, "Canonical reduction of the general regression problem, ’ Annals of 
Euffime.il, Vol. 11 (1941), pp. 42 40. 


Part IV. Randomization Tests 

( 4 ( 3 ! K. J. (5 Pitman, "Bignitieiinre testa which may lie applied to samples from any popula¬ 
tion, (I),” Jour Huy, Slat Sue. Supp., Vol. 4 (1937), pp. 119-130. 

(47| K J (!. Pitman, "Bigmncnitce tesla which may be applied to samples froin any popula¬ 
tion. ftl), The correlation roellieient lest," Jour, Huy, Slat, Sue. Supp , Vol. 4 


(1937), pp. 225 232. 

(4K| K. J Cl. Pitman, “Hignilieanee testa which may In’ applied to samples from any liopum- 
itort, (Ilf),The analysis of variance test," llwrnetrika, Vol. 29 (1038), pp. 322-JJ5. 
[49) 1). L. Wr.mii, "On the z-iesl in randomised blocks and Latin squares," Ihome.lrika, 


Vol. 20 (1037), pp. 20-52. 

(50) B. L. Wblck, "On teats foe homogeneity," Hiomdrika , Vol. 30 (1938), pp. 149 168. 

(51) K. 8. Pearson, "Some, aspects of the problem of randomisation," Biomelnka, Vol, 29 

(1038), pp, 63-64. , , ,, , , 

Wole.' None of the 1941 Biomelnka was received until after this paper bad been read and 

prepared for publication. 



RECENT ADVANCES IN MATHEMATICAL STATISTICS, II' 

Ilr Cecii, C. Craw 
University <>/ Mkhignn 


The statistical theory of the linear relationship between a dej* jeh-ist vanabh* 
Xi , and a set of independent variable# x*, x*, ■ - - , x r *s. r* by n<m quite gen* 
erally understood. Supposing that the x,'» are twawircd from tlw-ir r<-«j<i'e!m* 
means, we determine the coefficients, h 5 , h,, • • • ,ii 1+s , in curb u way s> So 
maximize the coefficient of correlation ri S3 - betwwn xi and 22 . Tin* 

coefficient of correlation, usually called the multiple mrrolatum nr* ffi.-jent, 
measures the exactness of the linear relationship that exi**tf*. arid it has tie* 
property of being quite unchanged if the origins or the scales for the separate 
x,’s are changed in any way or even if the wl x a . x*, • * • , x f »i should }#■ replaced 
by any equivalent set of linear combinations of them. That is. e g , if f * It, 
the new variables, Vt * a* 4- x, + x 4 , e, « 2ri - x s d dr,, r, x s » 2/, ■ 2r% 
are equivalent to *»,*,, Xi, since the latter can lie found if the i are kiiimn, 
and the multiple correlation between x t and the r,V is exactly the -mm* m- Unit 
between x> and xi, x 3 , x,. Moreover, the requisite sampling slmmy if tin* 
variables involved are normally distributed is well established 
I want to discuss briefly an important generalization of this kind of situation 
that has been the subject of recent research. In particular, in lit 1 * j«;q»T, “Rela¬ 
tions between two seta of variables,’' published in liiomrtnkfi in HKW |13 II. 
Hotelling set forth these ideas in excellent fashion ami contributed much to the 
mathematical theory required for their practical application. We now mijijjom* 
that we have two sets of measurements, x,, • • • , x ,, and x, fl , * * • made 

on the same object and that we are interested in the linear relations that may 
exist between the members of one set and the members of the other, A- an 
example, , ■ ■ ■ ,x, might be the prices of s more or lew related commodities 
at a given time, and x, +l , • ■ • , x, +1 measures of factors which muv In* thought 
to be effective in the price situation. 

In the more special case I began with, s = l, and a single equation fully ex¬ 
pressed the linear statistical relationship of Xt with x», ■ * > , x,+i. Now then* 
are a dependent variables and now with s £ f, not one but * distinct linear 
relations will exist and will be required to fully describe the linear connections 
etween the two sets of variables. We may assume that there Is no men* 
duplication among the variables we are using, i.e., no one of the * /,’« is always 
exactly given by a linear combination of the others in the act and the same’is 


vane ™to t T* *? B ' K * Cftm ' 5 and «» author on "Recent Ad- 
nitnc Sty and th« S ST ^ Amor!oaJ1 StalisliMl -^ociufum, the K«»m>- 

New York oR LTJZ i ° ^thematical Statistic, on December 30, UHt, tn 
years. - y h th 8 fi0iectod to P ioa frotn Papere published during the past five 


74 



M ITHKMATirAI. STATIHtlf’K 


75 


abn trio* m! she x»*i . ■ • • , x,.,. Xmv there is no logical or mathematical 
Heredity for the way in which we are so far using our measurements. Suppose 
x - 2 ftfi'l ( - 3. We can find the last linear regression equation for x 4 on 
x», x ,, Xj ami then Isnd the like equation for x 2 on x,, x 4 , x 5 . But we could 
very j«»sx|hlv get more meaning out of the situation if we lx*gan by replacing xi 
mifl .»•} bv. say. u, .* xi i x 3 and m -- x, - x 2 and similarly replacing x, , x 4 , x 6 
hv three r.V formed from these three x's in a similar fashion. We have, really 
Wen making; a quite arbitrary choice among the u 's and c’h that could he used 
mid the quclion presents itwdf. What significance is there in tilts way we choose 
our mV and i V? 

it turtle out to la- much more than a merely reasonable beginning to try to 
determine a n from the first set and a v from the second in such a way that they 
will lie more chicly correlated than any other u and v formed in this linear 
fashion from the s xV in the first set and the l ,r's in the second. That is, we sot, 

* « H 

» — 22 x„ and v *= X) ^>x,, 

IMP! 1 

and tleiettmne the n„\ and tin- h,V which will maximize r u ». We may say that 
this n and » will account for more of the linear dependence of Xi , ■ • ■ , x, upon 
x,*t . ■ • • . x, than will any other u and r. 'I’o the mathematician familiar 
tilings Iw gtn to appear, thmigli, tis Hotelling remarks, in its purely mathematical 
form the* problem seems to Is* n<>w. A very important observation is the fact 
that flu- maximum r„„ would Ik* quite unaffected by any change in origin or 
seuh* on any of the x’s; it is even unaffected if we should begin by replacing the 
first a /V by any equivalent set of « linear combinations of them ns new variables 
to work with and by doing the same thing on the second set of f'x’s. Hotelling 
makes u**r of this circumstance to greatly simplify his mathematical de¬ 
velopments. 

Now things full out in a very interesting way. One actually solves not for the 
a 'a and h\ at first, hut instead for the maximized r„ . Having this the corre¬ 
sponding «V and b’« can then Ire found. But generally the equation for r„„ 
gives not one but « different values for r,,,! What is the meaning of the s 
different r„V? Well, you renu*mlx*r that I said that s relations (s & l) would 
appear to exist between the two sets of variables. These s r MP ’s correspond to 
those* s linear relations which are picked out in a unique way. We now have 
« w, v pairs which arc independent of each other in the sense that no u or v is 
correlated with any other w or c with the exception of the other member of its 
pair, and of course this correlation is precisely the r ue by which the pair was 
determined. Further, the largest r ul , gives the maximum u and v we set out 
to find; the second largest r„ determines the pair «, v of maximum correlation 
among those independent, in the sense just described, of the first pair; the third 
largest r W¥ leads to the. u, v of maximum correlation among those independent of 
the first two pains, and so on. The s independent linear relations among them 
completely describe the linear statistical dependence of the one set of variables 
upon the other. The relations arc essentially those between the a, a pairs and 



76 


Cecil e. cfutci 


the closeness of these are measured by r,, r- t , • • * , f,, which I writ#* for the 
sr vv ’s. The now variables are called canonical variables and the eomdatiurv* 
between them canonical correlations. We may say that the maximum pair. 
u, v, gives both the best linear predictor that can be formed from x, 4! . • . 
and also the linear combination of x t , > • • , x, that can be prediefod, 

I have to try to deal briefly with the numerous ideas and remits in thf' paper 
which is not unrelated to earlier work by the author and by S » s Wilks hirst, 
what about an over all measure of the linear connection U*twcvn the two seta 
of variables? It is shown that 

q ■ ± m ... r, and z « U - rjlfl - r\) «(1 - rf}, 

have properties that make it appropriate to call the first, the (vector) correlation 
coefficient between the two acts and the second the coefficient of alienation. 
Both are simply expressed by means of determinants of the eovarmnew fpmrfurl 
moments) among the st x’a. For example, if s ™ 1, q is “imply r t j * ,,i ■ If 

a - t = 2, 

ru rj< — rnr«t 

g m V(1 - r>,)(l - rj,)’ 

the numerator of which is the tetrad difference of the psychologists. Further, 
if it should happen that xj and x* are identical, this q bmmu* r n , 

In an application, of course, the various quantities appearing above will have 
to be calculated from an observed set of values of xi , ■ ■ • , x ,, Jr, ,i , * ■ - , x, M ■ 
Hotelling adapts an iterative process he had previously given to calculating the 
canonical n , , r, , from which the canonical variables can 1«‘ found, ami lie 

numerically illustrates the whole procedure. But what is more difficult is to 
solve the sampling problems that arise. It is very helpful to assume that all 
the x’s obey a multiple normal frequency law. 

First, Hotelling derives expressions for the standard errors of the r's and of 
q and z which are approximations useful for large samples. But for smalt 
samples exact sampling distributions are needed. Wilks (2) had earlier studied 
the exact sampling distribution of z in the case in which we are interested, that 
in the population the set x t , ■ • • , x,, is completely independent of the set 
x.+i, ■ ■ , x, + i, though he did not leave his general result in a form suitable for 
calculation. Hotelling now finds the distribution function for q for * « 2. 
The result is not in all cases simple in form but numerical values eari be obtained 
from it. The relations between these two possible testa, one bast'd on z and the 
other based on q, are discussed at length. 

An obvious undertaking would be to try to find the exact joint Rampling 
distribution of the canonical correlations for any a and t, and I will say some¬ 
thing about the very interesting papers in which this problem was solved. But 
some of this later work arose in a different though related setting which I want 
to discuss briefly first. 

In 1936 R. A. Fisher published “The use of multiple measurements in taxo¬ 
nomic problems,” t3] which was the introduction of linear discriminant functions 



M ATI t KMATIf'AI- STATISTICS 


77 


t»* the unrid. riupisw that Ah random individuals of one race 

i-pf cic'., v.mcty, et«\J have Ixa-n measured with respect to each of k character¬ 
istic*' mid Hint A 5 random individuals of another race have boon similarly 
mo!».'Ur*d What linear combination of these measurements would serve best 
to di’.bngm-h rs of one race from those of the other? An example used 

bv Fisher in this paper was that, of two samples of 50 plants each of two varieties 
of iri*> found growing together in the same colony. In the flower on each plant, 
there a a- measured the sepal length, jti , the sepal width, tr 3 , the petal length, 
rs, and the jH*tnl width, x t . What linear function, 

A' k\T i + XsTi *4“ XaTj -F X|J*4 , 


would enable one to most surely identify the variety to which each single plant 
Iw'long: ? 'l*o choose such an A* Fisher proposed the mathematieal principle that 
tin* coefficients, X, , t 1, 2, 3, 4, lie determined so that the difference in the 
average value of A' in the one variety and the average value in the other divided 
by the sum of squares of the* A'V taken about the two group means shall be a 
maximum Then quite simple mathematics leads to the required numerical 
values of the X,V. 

Hut now that we have set up such an instrument as X, there is a more interest¬ 
ing use to which it can lie put. Sup|«ise that the question were to establish 
that the A"i indh iduals from the one group and the N s individuals from the other 
really lielong to different races distinguishable with rosiieet to the complex of 
diameter’, we have chosen to measure in each. We are on the old question of 
racial likeness or unlikeness and obviously the word "race" may have a meaning 
broad enough to give this work of Fisher’s wide application indeed, Subject 
to the principle aeeortling to which the coefficients X, arc determined from sample 
set* of measurements, A' is the liest possible linear discriminant function. We 
are now faced with the question of the statistical significance of the difference 
lielwecn the means of A' for each group compared to the above mentioned in¬ 
ternal sum of squares. 

It is generally useful and enlightening in a problem of this general nature 
turning on the use of linear and quadratic forms to consider its interpretation 
as an analysis of variance or covariance. Fisher readily provides such a set-up 
in this ease by assigning to the quality of belonging to race A a numerical value, 
i/i, the same for all members of that, race, and by assigning in like, fashion a 
different numerical value, y*, to the quality of belonging to race B. It is 
mathematically convenient if we have samples of A r i and Ah from races A and B 
resjH’etively, to let 


Vi 


AT, 

Ah + Ah 


and 3/9 


Ah 

M + AV 


for then over the combined sample of Ah + Ah , we have, 


Ah Ah 
N i + Nt' 


S(y ) =» 0 and Siy 1 ) 



78 


C8CH. C. CRAKJ 


This may seem somewhat arbitrary at first glance, but let ns start anew by 
writing the linear regression equation, 

y •= 2 M x < ~ -hX 


in which y takes on one of the two value* above ami in which t , is the mean of 
X{ in the combined sample, and then proceeding to determine the h, % in the usual 
least squares fashion. The b,'s turn out to be proportional to the X,V previously 
found. Now the total variance of the p's is analyzed into that within groujm 
and that between groups and it is immediately suggested that the usual 2 -test 
with k and N - k - 1 degrees of freedom is the appropriate one. Hut, a* 
Fisher remarks, ordinarily for the application of this test one postulates a impu¬ 
tation in which the y's have a normal distribution for each fixed set of \allies of 
-d, as, Here, however, the y remains fixed and one postulates a 

normal distribution of the z’s associated with a given value of y. Not to leave 
this matter in doubt, though I shall return to it, I may remark that Fisher 
noted that earlier work by Hotelling [4] showed that the 2 -tent is nevertheless 


the proper one to use. 

I have to be brief indeed concerning linear discriminant functions. Fisher 
wrote further papers dealing with them in 1938 15), 1939 |9J, and 1919 \7\ and 
among others, Mahalanobia (8), Bose (9, 10), and Roy U0|, of the “Calcutta 
School" have made relevant contributions. In particular, Malmlanobm (8) 
introduced the concept of the generalized distance hy which two sets of multiple 
measurements differ, which has an obvious connection with the present subject. 
Fisher atao discussed a test for the direction in fc-spacc in which two such samples 
differ most and in case we have three such samples from three different races 
provided a test for their collinearity. 

In his 1939 paper mentioned above [01, Fisher called attention to the connec¬ 
tions between the theory of linear discriminant functions and Hotelling’s ca¬ 
nonical correlations. Of course it can be said at once that a linear discriminant 
function arises as the very special case of investigating the linear relationship 
between the artificially introduced y and aq, Xt , • < • , x*,. And the teat of sig¬ 
nificance based on the analysis of variance turns on the ratio of the sum of 
squares due to regression, i.e., among the predicted values, to the total sum of 
squares for the regression and for the residuals, This analysis is quite general 
in form and can equally well be set up if one is predicting linear forms formed 
from Ni variables from linear forms made up from N% other variables. If one 
sets up the condition that this ratio, d, be a maximum ono is led, as Fisher shows, 
to a determinantal equation in d, the roots of which are the squares of Hotel¬ 
ling’s canonical correlations. 

Mathematically the general problem we are interested in is equivalent to the 
following: We have a sample of Ni + Nt observed values of p normally dis¬ 
tributed variables. If a ti is the covariance of the t-th and j-th. variables in the 
sample of Ni and bn the like covariance in the sample of Ni we want the sampling 
distribution of the roots of the determinantal equation: 



MATHEMATICAL STATISTICS 


79 


| a„- — d(an + b <,) | = 0, 

under the hyjKitheew that the first sample is independent of the second. This 
problem Fisher solved in his 1939 paper though in his characteristically concise 
and intuitive manner. But in the same number of the Annals of Eugenics, 

P. L. Hmt 111 ], at. Fisher’s suggestion, gave a complete analytical solution. Hsu 
also showed more in detail how the result applies to Hotelling's case of N ob¬ 
servations on n ft normally distributed variables in which the set of s is inde¬ 
pendent of the eerond set of l. In his 1930 paper Hotelling gave the result for 
$ i && 2 and in 1939, Girsehiek [12] gave the solution for s = 2 and l > 2. * 
Hsu showed, too, the striking fact, mentioned by Fisher, that it is sufficient 
that only one of the two sets of « and of t variables be normally distributed in 
order that the distribution function found apply. This provides the explanation 
of why the test of significance applied by Fisher for linear discriminant functions 
is valid even though the y introduced hail an arbitrary distribution of values. 

The simultaneous distribution of the canonical correlations is fundamental 
but on finding it not all difficulties are thereby resolved. As mentioned above, 
either of the quantities, z or q , as they appear in Hotelling’s paper, furnish over 
all tests, or rather they would if their distribution functions were obtained in a 
satisfactory form. The form of the distribution of z for complete independence 
was given by Wilks as early as 1932 [2] hut that of q for 8 > 2 is still lacking. 
For « > 2 there are difficulties in applications even with z and in 1938 [13] 
M. H. Bartlett proposed a more convenient approximate test. Ordinarily, how¬ 
ever, one would want to teat the largest canonical correlation alone for signifi¬ 
cance. There are two kinds of trouble here. First, there is no assurance that 
the largest observed canonical correlation corresponds to the largest one in the 
population. Second, it is quite important to know whether the remaining popu¬ 
lation correlations are zero or not. Bartlett in 1941 [14] discussed these points, 

Now’ 1 make an abrupt change in subject. Some interesting work has been 
done on the theory of runs and its applications during the last five years. 

First, 1 want to try' to convey some idea of the contents of three papers by 
W. D. Kcrmack and A. G. McKendrick published in 1937 [15,16] and 1938 [17]. 
Suppose we have an unlimited set of numbers, no two of which are equal, and 
start drawing from them at random, recording the numbers in sequence as they 
come. Within the sequence drawn there will occur runs up and runs down of 
varying lengths. Thus in the sequence of 10 numbers, 2, 5, 11,8, 9, 4, 3, 7, 14, 
12, there are 3 runs up, one of length 2 and 2 of length 3, and 3 runs down, 

2 of length 2 and one of length 3. Both ends of a run are counted in finding its 
length; no run can have a length less than 2. The total number of runs is 6 of 
which 3 are of length 2 and 3 are of length 3. We can also count the gaps which 
extend from crest to crest or from trough to trough and note their lengths with 
the convention that again both ends are counted in determining a length, so 
that no gap length is less than 3, Thus in the sequence of 10 numbers above 
there is one gap of length 3, 3 of length 4, and one of length 6. 

It is clear that if we know the distribution for runs or for gaps of different 



80 


fhr'IJ, (S5%IU 


lengths wo pan compare an observed w«|u< , im‘. or rafie-r an n|*/r‘«i *h*,nhijii«n 
of runs or gaps hv lengths, with the fr<-tj«i-rics< . <’;,!< uh*?< ri on ’!«- hvpothe-iH 
of randomness and is- hv way nf acquiring a nv! fcu tin-- hv|-*,»h» -j. T-< j*. 

brief, in these pajicrs these theoretical dHnhoimtr- ar» found *> «. ih» r «i»l, 
their means and variances. Then* are some inn re-ring apph« sri-m* j s{ ..., 
landom sampling number* and a series of ntor-w-d mmilv-r- ini), 

passed the * 2 -test as random and also p&wa-ri the test han-ri «<n rh«- ri< ;».*rtur«* nf 
the mean from its c.\|>crted value compared with it., »t«tid.wi dktmrioft u 9 , 
the other hand, the series of Swedish death ran - for the pried r;p> pxirt 
could not conceivably Ik* random. Hit** investigation >,oe prompted m i)«. lir * t 
place by the fluctuations of the death rate from erttomolm tn iwo- ns an t-Np-ri- 
mentally induced epidemie. 

The problems here dt-alt with had Ix-eo only partially solved 5 ,y r % n t„ rf{i 
There is much interesting material in tln-w- juifx r- I have no cp*.-,- J Mr , The 
authors readily include the case in which the mtmltcrs competing tie- j-tpubutoji 
are not all different. They also studied series of Htuind length, nrrungid 
in a cycle or ring and even what may is- formed « Mobim. <v,-|e 

A. M. Mood in 1940 118] in an interesting jmp-r imWigaud a riiff.-rent form 
of the problem of rum. Suppose we have n elements of two hm-b. ,n\ n % »\ 
and m = n - n, 6'h, and that these are arranged at random ut a nm 1-W 
example, if n t = 5 and n* « 7, and if a random arrangement of tlm< It? **’«. moi !,'» 
ia babbbabbaaab, the a'a occur in 2 runs of one and m one run of 3 and the h\ 
come in 2 runs of one, in one run of 2 and in one nut or 3. If r„ •, > i. 2t ia 
the number of runs of j of elements of variety i, Mood finds the pmtal'ilitv „f 
obtaining a given set of values of r,, aueli that £jr tt %> «, if , j a, , 


’turn tins 

is Mu ll UK 


obtaining a given pattern of runs in the two kinds of objects. Hs- 
baaic distnbution function he obtains certain marginal distribution* -me!, as 
that for the occurrence of a given act of runs in the a’a regardless of how the h'n 
all (except that they must provide the necessary point* of division), or that for 
ri and r 2 if these are respectively the total number of runs of «'« «id of h'» or 
that for ji or r* alone. He finds the factorial moments of tbm* variables and 
™ ! lelr means ’ van L ttncca and covariances. Similar mntlu are obtained in 

mir MooTt m ° r i tW ° ° f part of the 

arbelp B M f UrnS *?? T e 0f drftwitlKH frorn ttn in| in<*c |M»miUiiwt tn which 
two kTnds m0r ° H , 0CC '! 1 ' ln “ pwportiow. Finally, in both of the 

tions studied as the sample size increases. An Mood notes, her,- too H f ( , vV at 

but u,i " w •* ^ 

wi+y ri’o? aPe j f. nte , datin 1 g Mood’s by some six months, A. Wald and J Wolfo- 
n/iL^hT? the 't«tobuto . (unction h, the total number ot nm» Uiro.pJL 

a test of the lZtw ? nurabera of Lwo ki ^fl of element* to provide 

with a continuous disWb^LrhT 11 ^^^" 0 ^ t1U> mW 

aistrioution law. If the observations in the two samples 



mathkmatioai, statistics 


81 


romltint'd an* arranged in order of magnitude and if then the observations from 
tin* first sample an* earli replaced by a zero and tiio.se from the second are each 
replaced by a one, we have a situation to which this distribution function for 
run** applies. W. L. Stevens in HIM [20] also discussed an application of this 
distribution. 

The third principal topic I have chosen for my remarks is developments in 
tin- use of the probability integral transformation. The use of this device at all 
seems to l*e quite recent, apiiearing m a paper by IX. Cramer in 1928 [21] who 
invented a t<*M of goodness of lit which reappeared as the "w 2 -test” in apparently 
iml(*I>emlent work of It. von Miscs in 1931 [22]. In 1932 in a section new in the 
fourth edition of “Statistical Methods for Research Workers/' [23] Fisher showed 
the usefulness «>f this transformation in combining independent tests of signifi¬ 
cance and in 1933 and 1934 Karl Pearson [24, 25] had papers in Biomelrika on 
the subject. 

As for the transformation itself, suppose that p(x-) is the probability density 
function of a continuous variable x defined on the range (a, b) such that, 

[ p(x) dx = 1. 


Then let u* introduce the variable, 

II « f p(x) dx, 


which is tlie probability that a value of the variable at random will Ik* less than x. 
It. will Ik* seen that since x is a random variable, the projxirtion of population 
values less tlmti an x drawn at random is itself a random variable. Perhaps 
this will Ik* clearer if I use a simple example of J. Reyman's to show how a 
sample of /s also determines a sample of p’s for a given p{x). Suppose that, 


pM = 



e 




and that a sample of 5 values of x arranged in order of magnitude is: — 1.5, 
— l.I, -0.5, 0.0, 1.0. Then by reference to a table of areas under the normal 
curve of error, we find that, the corresponding observed y's are: 0.067, 0.136, 
0.309, 0,720, 0.945. It is obvious that the range for y is always, for any p(x), 
(0, 1). Further if f(y) is the probability density function for y, of course, 

/(?/) dy « pM dx. 


Hut from the definition of y, 

dy — p(x) dx, 

so that/(y) *= 1. Thus, quite independently of p{x), y obeys a rectangular 
distribution law on the range (0, 1). 

This simplicity of the distribution of the quantity y and its independence of 



82 


CECIL C, CKKW 


p(x) are moat attractive properties. I shall note bri-ily sow* <<( fits* applications 
that have been made in recent years. 

In 1936 W. E. Thompson 120] donot<*d V>v p t the probability that in a “ample 
of IV a randomly chosen * will Ik* less than jr k , the h virtue «ri*w rv< <i Then 
the probability that p' £ p* g p" is just p" ~ p'. The probability that »*wtly 
r other members of the sample will be less than is then, 

[ N 7 1 )pHi - P*f 1 

Further for all samples in wliich just r values omir lc« than h , lh«* |*rujs(rtioi» 
of occasions on which p’ g p* S p" is Riven by 

/* P r (l - P)^dp /0(r + 1, .V - r>, 


the difference of two incomplete, 0-functians. Rut that there .are exactly r ole 
served x's less than Xk is equivalent to saying that x k is the <r » l pet olwrva- 
tion in order of magnitude, so that in the ntnive we may m w*<ti replace r hy 

* A* 

h — 1. It is easy to find that the expeeted value of p k in wh _ 

A + 1 

k(N — k + 11 

and that the variance is . It follows from the first of thw two 

{N + 1 ) 2 (jV + 2) 

expressions that the proportion of occasions on which r* < x < *>.*», is 
lc \ — 2k 

— y j— , (A r + 1 > 2k). Statements of this kind establish emifiderux* 

limits. Thus if one says that in a sample of N, an crimervatwn at random will 
fall between the fc-th and the. (IV — A: ■+■ 1 )-st observations in order of magnitude, 

such a statement has a probability of 1 ~ of being true. Or, the 

iY + 1 

integral just above is the fiducial probability of the truth of // £ p, £ p " if 
in a sample of N the fc-th observation is the (r -f l)-sl in order of magnitude. 
Thompson went on to obtain confidence limits for the median in a wamplc? of N 
from any population. 


In 1939 Wald and Wolfowits [27] studied the problem, of obtaining confidence 
limits for v {x), the proportion of observations in » sample of A* with values 
less than a given x, the population obeying any continuous distribution law. 
Their arguments are too complicated to attempt to sketch them hero, but they 
are based on the fact that the transformed variable, y, m defined itbm-r, i« 
rectangularly distributed on the interval (0, 1). With their exact solution they 


gave a more convenient approximate method for calculation iu applications, 

In 1938 (I am not being strictly chronological) E. S. Pearson (28) published 
a study of test criteria based on this probability integral transformation. Sup¬ 
pose that we have n independently observed y% y t , Vt , ... , y K . How should 
tee j/s 1 ^ used to test the hypothesis that the observations from which the y's 
were calculated all came fiom the same population? K. Pearson (24) hod 



Mathematical statistics 


83 


already suggested thr uw of Q = m , • ■ • y. or Q' » (1 - y,)(l - y t ) ... 
(I “ y„). It is known that a simple function of Q or of Q' obeys a -^-distri¬ 
bution with 2n degrees of freedom so that we, have a ready means of combining 
indejK'mfent teste bawd cm Q or Q'. Hut how is one to choose among Q, Q' or 
other functions of the, y'« that might be suggested? E. S. Pearson emphasized 
the role that the, hypotheses conceived as alternate to the one being tested 
should play in making such a choice. He illustrates this in a case of testing 
the hypothesis that a sample came from a normal population of zero mean and 
unit variance and in which the alternate populations, from one of which the 
sample might have, been drawn, are such that the corresponding y's calculated 
on the hypothesis being tested would follow a Pearson type I distribution law. 
Using the likelihood principle, he was led in this case to Q or Q\ which are then 
concluded to be "best possible testa." 

The final paper I want, to discuss is an important one by J, Neyman on the 
“Smooth test of goodness of fit," published in 1D37 [29]. Suppose again that a 
random sample of N values of x gives the set, yi , y% , • ■ > on the hypothesis 
Ho that the population distribution law is p(x | Ho). If ff a ia true the j/’s in 
random samples do follow a rectangular distribution on (0, 1). But what would 
be the distribution of the y's if the distribution law for the population were 
actually p(x j //*)? We have for the y's as calculated, 

y «** [ p{x ( Ho) dx. 

But to find J{y), 

KV) dy m -p(x | H i) dx, 

so that, 


/(y) 


p(x 1 Hi) 
P(x 1 Ho) 


$ 1 , 


Therefore if Ho is not true, the y's calculated on the assumption that it is may 
be expected to exhibit a statistically significant set of deviations from a rec¬ 


tangular distribution. 

As Neythan remarks, it is a defect of the x’-test of goodness of fit that the 
information one has of the algebraic signs of the differences between calculated 
and observed frequencies, particularly of the way in which positive and negative 
differences succeed each other, is completely unused. And in forming a test of a 
statistical hypothesis it is now well understood, thanks to Neyman and Pearson, 
that due account should be taken of the alternate hypotheses conceivably true. 

Neyman begins by specifying a wide class of alternate hypotheses in a form 
that lends itself to mathematical treatment. This is done by assuming that the 
distribution of y’a calculated for H B will, if an alternate Hi is true, be given by a 


function of the form, 




84 


CECO, C. CRAIG 


in which n{y) i» a polynomial of degree i (a. tnawformcri I^gcndre jwlyrntmial) 
with convenient properties. For low values of k, mirh a* will ordinarily Ih* used, 
this permits alternate distribution curves to deviate in a srnrwdh maimer from 
the distribution tested, with a limited numlier of interwetinm with it. 

Now the problem is to determine the function of the otwrrvrd j/V which will 
provide a suitable teat of lit with respect to the alternate hyjx>ih«ww of order or 
class k, k having been decided upon in advance of making the tout. The mathe¬ 
matics, proceeding along Neyman and Pearson lines, shows that the appropriate 

A? 

function, for large samples at least, is simply wJwh, 

1 * 

£ *<&•'> 

the j/J’s being calculated from the sample. Moreover, the probability that the 

h 

sum 2 u \ exceeds a given value is at once obtained from a table of theineom- 

i 

plete T-function, i,e., this sum is proportional to a x’. 

This is a very fine piece of work but, as Neyman points out, there are still 
questions to be settled concerning the general utility of this ''smooth tout." 
F. N. David in 1939 {30] further discussed this teat. In particular, it may lx* 
pointed out that the parameters in p(x | Ho) must be assumed known; what 
would be the effect on the test of estimating these, parameters is unknown. A 
reasonably large sample seems to be required to make the developments on the 
assumption of large samples applicable but a y must txt calculated for each 
observation. This makes for a good deal of computing but it is not known how 
grouping of observations might be effected, And the matter of the choice of the 
order of the test to be applied, i.e., of a value, of k, is still somewhat in doubt. 

I will not debate the proposition that there are papers completely omitted 
from this discussion aa important aa those I have included however inadequately. 
The limitations of space forced me to choose and it is quite possible that my 
personal tastes and interests had more weight than they should. 


REFERENCES 


[1] H.Hothllwci, "Relation!! between two Beta of variates," BtVmrirtJb, 28 (1038),328-377, 
12] S. S. Wilks, "Certain generalisations In the analysis of variance," Bimalrika, U 
(1632), 471-494. 

[3] R. A. Fisher, "The use of multiple measurements In taxonomic problems," Anmlt 

of Eugenia , 7 (1036), 170-188. 

[4] H. Hotelling, "The generalisation of Student's ratio," AnrnU of Mat A. Mat,, 2 

(1931), 360-378, 


[5] R. A. Fisher, "The statistical utilisation of multiple measurements," An no to of 

Eugenia, 8 (1638), 376-386. 

[6] R. A. Fisher, "The sampling distribution of some statistics obtained from non-linear 

equations," Annale of Eugenia, 9 (1930), 238-249. 

[7] R. A. Fisher, “The precision of discriminant functions," Annale of Eugenia, 10 



MATHEMATICAL STATISTICS 


85 


]S( I’ f" ’>1 vi > i '.’••■ill *, "On tlm generalized distance in statistics/' Proc. NaL. Inst. 

A'o hoi , 12 1OT), 49 55. 

|u: U f* Hm>», "Oft the exact distribution of the 1)* statistic/’ Sankhya., 2 (1030), 143-154. 
|lft] It t' if- t vMi S X Hov, "The exnrt distribution of the Studontized D* statistic," 

.1 ThTHi, part 4 

Ml' V, J- ID*. " {| n tlie distribution of mots of ccrlmti dctcrminaiitnl equations,” Annals 
tijf Ku$rr,u:, tt 1 1OT), 200 258 

|12) M A (iltcw'itx a, "On the sampling theory of tin* roots of determinantal equations," 

Anmh «/ Math Suit , 10 <10351), 203 224. 

j 13; M „H Him turn, "further aspects of the theory of multiple, regression," Proc. (lamb 
Phi ,W , 31 33 10, 

in: M 8 lUfm.nrf, " Hu* statistical Rigid licuncc of canonical correlations," Biomctrika, 

32 *1941), 20 37 

(15| \V. 1) K» iisu k wo A. tl. McKendrick, "Tests for randomness in a series of numcn- 
ical olwmrvithnn#," Hoy. Par. Edin. Proc., 57 (1037), 228-2-10. 
jit/ W. 1). KrnwvK and A (5. McKekusick, "Home distnhulions associated with a 
randomly arranged set of numbers/' Ibid , 332-370. 
jl7| W. It Kmiwmk \Kt> A. tl. McKtsdrick, "Some properties of points arranged at 
random on a Muhins surface," Mathematical Gazette, 22 (1938), 06-72. 

(18) A M Moon, "The distribution theory of runs," /lnnals of Math. Slat., 11 (1040), 
307 3!tt 

[U)| A Wu,r» \m» J. Woi.row tt?,, "On a teat whether two samples are from the Bame popu- 
latmn," Annals of Math. Elat., 11 (1W0), 147 102. 

[aoj w L HtKvtiffP, "Distribution of group# in a sequence of alternatives/' Annals of 
Eiiymirn, 9 HOT), 10-17. 

J211 11. t’tuMrh, "On the composition of elementary errors, Second paper, Statistical 
appliralinna," Skamhmvwk Aktuarielidskrift, 11 (1928), 141-180. 

(221 It. vo.v Miami, "Vorlcsiingcn aus dcin (iehiete. tier angewandton Maternatik," Bd. 1: 

H'lihrachnnltehkeitrrcrhnuny, (Leipzig, 1931), 316-335. 

(231 It- A. FtetiRfi, Statistical Methods for Research Workers, (Edinburgh, 4 th edition, 1932), 
Article 211, 

(24 { K. Pearson, "fin « method of determining whether a sample of size n supposed to have 
been drawn from a parent population having a known probability integral has 
probably hern drawn ni random," Ihometrika, 25 (1933), 379-410. 

(25| K. IV.ahmin, "On a new* method of determining 'goodness of fit/" Biomelrika, 20 

(1934), 425-442. .. 

(201 W. It. TKOMRfom, "On confidence ranges for the median and other expectation dis¬ 
tributions for populations of unknown distribution form," Annals of Math. 

filed., 7 (1938), 122 12K. „ 

(27) A, Wald and J. Wourowm, "Confidence limits for continuous distribution functions, 

Amtflfs of Math. Slat., 10 (1939), 105-118. t 

(28) E, 8 . Pearson, “The prolmhility integral transformation for testing goodnow of fit 

and combining independent tests of significance.” Ihometrika,AO ( 1938 ), 

129) J. Nktaian, “A smooth test of goodness of fit," Skandinavisk AklwmlMknft, 20 


(1937), 149 199. T 4 .... , ,, 

(30) F. N. David, “On Noyman’fl ’smootli 1 test of goodness of fit, I. Dutobubm ol to* 
criterion f when the hypothesis tested is true,” Bwmtnka , 31 (1930), iui- 100 . 



NOTES 

This section is devoted to brirj research nn<h/piminiy nrl%rli*> n«*b 5«n m*lhthinbigy 
and other short items. 


A FURTHER REMARK CONCERNING THE DISTRIBUTION OF THE 
RATIO OF THE MEAN SQUARE SUCCESSIVE DIFFERENCE 
TO THE VARIANCE 1 

By John* von* Nwman'v 
Institute fur Advanced Study* 

1 Introduction. In our previous paimr 1 it wiw fotinil romrinmil t»* iiw-inm* 
that the numlxw m (of thr variables of the quadratic form nndi-r rumdrirmf ionl 
is even. (Of. p. 383, loc, cit.) This means that in tin* applintfum h* the no-an 
square successive difference n = m + 1 itued Ik* odd. f(*f. p 3M1, id \ 

In this note we shall show that the* distribution for an odd m c art even »t 
can be expressed by means of the distribution for an even m the latter Ihhir 
already known, loe, cit. 

r\ 

Specifically, consider the distribution of y — 22 . if the jr* » • , ,r m are 

r~l 

m 

equidistvibuted over the surface 22 *1 *= b Denuto the rn-uplet «n t , ■ ■ , n«d 

fi’ttX 

by A, then tire distribution function of y depends on A ; denote that distribution 
by <»a(y). (Cf. p. 372 id., we write a, for the B„ there.) 

Now consider an m-uplet A * (a t , • • • , a„) ami a p-uplel It ^ On , - • ■ . h p ) 
and form the m -f p-uplct C ® (ai, • • ■ , u M , h, , • • » , !>„), Writ** C - A 4- H. 
Then we shall show that there exists a simple expression for w<*(y) in therms of 
< 0 ^( 7 ) and ai„(y). 

For the specific application to the mean square successive difference, we can 
put n = m + 1, A — (cos (irp/n) for n — I, • • • , — 1, jn + 1. ■ * * , n — 1), 

B = (0), C = A + B ~ (cos t n/n for p = 1, * • • , n — 1). 

2 . The recursion formula. We proceed as follows, ran also Ik* used 

to express the joint statistics of 

tn m 

y = 22 On** p « 23*2. 

or better, the volume of that part of the *»,••• , aim-space which conrexjxmda 
to any given domain in the y, p-plane. Thus the volume corresixmding to a 

1 Cf. the paper by the same author, Annals of Math. Slat,, vol. 12(1941), pp, 067-395, 
’Also Scientific Advisory Committee of tho Ballistic IleBearob Laboratory, Aberdeen 
Proving Ground. 


86 



lium OK MKAN" KQU VUE HECeKSSIVE DIFFERENCE TO VARIANCE 


87 


given inliiiif4“-iiiiul y , p domain dy rip will dearly be 

('.»vV ' dy/p-u* — , 

\p/ p 

in 

where (\, !•» fii«* m J-dimeiidnnal) area of the Ci, ■ ■ * , .I'm-sin fare 2 v l — T 
rfhr unit ~ph<'tij, l.e., this volume in 

(1) *dydp. 

Similarly for 

f = y~! h r ul and u = 2 u l 

V>«1 Vteal 

till* volume rnrresjKmding to the inhnilesimul f, a domain df da is 

(2) iCrUB^Xa*' *d!d*. 

m V rn V 

Kin till v for d y +■ f — X3 + Xv Kul and r = p + a = X x l + X M 

l**-l ►-1 M“ l 

tin* volume eosrespomling to the infinitesimal 0, r domain <19 dr is 

(3) ^^(-^r^^dOdr. 

Now t- f, r -- p f it eormeel (1), (2), (3) as follows' 


y'a 


^ T P«> 


l *p-j dy • * c - “• t (p) p! "’ c “ w " C -1) (r 


This gives (either by simply putting r = 1, or else by replacing 6, y, p by tS, 
ns T-p) 

a 


m 2cl\" P l <h ‘l <h ' UA CD W " (l ~ p) p4W 2(1 P)i ” 2 ' 

To determine 5 ” ( " apply to this f do ■■■. Then 

*t Wfp * 

1 ■ dp' P im ‘(1 - p)' p 1 


c.c, _c.c, uwrtw 

" 2C.„ Slim ' W 2C.„ mC» + P)1 ’ 


Accordingly: 



88 


K. t. HMKJ.SHUH 


3. The special case. Let u*i now return t«» tb« at the* 

end of 1—■the application to the mean squarr “iirre-^ve *hff*T'*n^* 

There p - 1 arid Ii ~ *01, *> that the *'dif*tribiith»ii" of { r< c>*m » lUra^-d s4 * 
the point 0. Hence w R (0 b <u« “improper" didnloitn-n, m the 

same way.* Using (' and ,-l m downin'*! at the end of 1, fie /ih*i formula 
becomes (now m - n - 2, ;i --- 1) 


(II) 


<«>,»+«) W 


<lp- u>* 


hi 


HK» -■ ni f s 
ri4(n - 2 )iriil A 

It would have lx*en equally easy, of courw. to mdahlbh <111 dw*-fiv 
Putting p = l/l gives 


(in) .Wfl - - r |®rW4, />-»« - " ‘ 

Since u A (y) vanishes for j y \ > rus (w/ii), we may jvplwi* this integral 



Formula (III) can lx; used for numerical work, and abo u* extend th«* formula 
(3) on p, 391, loc. cit., to even values of n. 


CONVEXITY PROPERTIES OF GENERALIZED MEAN VALUE 

FUNCTIONS 

By E. F. Beckknbach 
University of Michigan 

In an article appearing in the Annals of Mathrnuiliral Statistic/ it was jMiinted 
out that while the mean value functions appearing below have lx-m <4udk'd 
and used since 1840, there appeared to have been no attempt made to investi¬ 
gate the behavior of their second derivatives. 

Consider (1) the unit weight or simple sample form 


in which the «< are positive numbers and in which l may take any real value; 
(2) the weighted sample form 

«(l) . ( **!-+-**« + “‘ + 

\ Cl + C» + * * * + Cb / * 


3 Dirac’s famous delta funotion. 1 ' It could be described by a StioUjes Integral. 

1 Nilan Norris, Convexity properties of generalized mean value functions, 1 * Annuls of 
Math Slat., Vol. 8 (1937), pp. 118-120. 


<p(l) 




+ + • * ■ + 
n 



MEAN VALUE FUNCTIONS 


89 


in whicli fh«« r, arc positive numbers, and in which the z, and t are restricted as 
in <.-0: and ril) the integral form 

m which jijr* is a pwitive continuous function for aa g x g xi . 

Siurr the analysis anti results arc essentially the same in all three cases, we 
reslrirt nttr attention to 6(1) 

As is well known, 2 Git) is a monotone non-decreasing function which varies 
from the minimum of fir) to the maximum of f(x) as t increases from -» to 
+ * . It is further of some importance to study the rate at which the rate of 
increase of this type bias is changing as t increases; the rate in question is given 
by the second derivative B"(t). 

Tht* following points were made by Norris, loc. cit.: (1) Since, as we have 
pointed out, Oil) has two horizontal asymptotes, 0(1) must have at least one 
inflection point. 12) Consideration of a simple example shows that theie is not 
necessarily an inflection jxiinl at t — 0; 0"(0) can be made to take on any real 
value. 

Thus il is ml true that 6"(i) must be positive for all t < 0 and negative for 
all ( > 0. On the other hand, we shall give, simple bounds for 0"(t) in the other 
direction; namely, we shall give a positive upper bound of 0"(t) for i < 0 and a 
lower bound for I, > 0. These bounds are precise in the sense that they are 
actually taken on in the special case f(x) » const. Their main advantage lies 
in the fact that while the expression for 0"(t) is quite involved, these bounds are 
simple expressions in the quantities 0(1) and 9’(t) which might already have 
Iveen computed. 

Let 


\(t) sa log 9(1). 


Differentiating, we obtain 


mo «t 


r< » / * [/(x)]'log[/(z)]'da; / i r* , \ 

1 ,is -jog (—t— / mu*). 

J*i 


It follows’ that 


and 


V(0 £ 0 

e'(t) s o. 


Let 


p(t) a tV(0- 


’See for instance, G. P61ya und G. SsegO, Aufgaben und Lehradtze aua der Analysis 
(Berlin, 1026), Vol. 1, pp. 64-56 and 210-211. 

•See G. P6lya und G. Szegfl, loo, cit., p. 210. 



90 


is. t. Br.rKr.NiwH 


Curiously, while X"(f) and tf'd) appear to 1«* rather f**mndsrid«\ the fbv»dy 
related quantity n'(t) is rfiade relatively simple by the fart that turn of tin- 
terms obtained by formal differentiation are negative* of oaeh other: and 
Schwarz' inequality can tat applied to the rrrn&irsmg term*, a# follows 
We obtain 




i 

- ( J o I /Uir log /! x) J , 


By Schwarz’ inequality/ it follows that 

» MO, 

with 

r(t) a 0 , 

the sign of equality holding if and only if fix) «e const. 
From the definition of p(i) we obtain 


AO - ft2A'(f) + ) « 


1 2 S' 

L 


(0 ~f M ,r (() 


dr MS’ 
fat 


whence 


2X'(t) + tX"(i) - --L 2T«) + W"(0 - 


w(0 a O; 


that is, 


tx"(o fe -2x'(o, w"w & - 2no. 


ffW 


It follows that for t < 0, we have 


and 


while for t > 0, we have 


and 


V'(t) S -2X'(f)/t 
fli.r.N < l»'(0r 2fl'(0 

u s 0(f) “I ■' 

X"(f) fc — 2X'(f)/f 


r W 6 W _ JfB 
T5T i 


* See G- Pilya und G. Szegfi, loo. oit., p. 54. 



CHARACTERIZATION OF NORMAL DISTRIBUTION 


91 


A CHARACTERIZATION OF THE NORMAL DISTRIBUTION 

By Eugene Lukacs 
Baltimore , Md, 

1. In sampling from a normal population the*, distributions of the mean and 
of the variant’!' are mutually independent. This well known property of the 
normal distribution is used in deriving the distribution of "Student’s” ratio. 
The indejn'mience of the distributions of the, mean and of the variance charac¬ 
terizes the normal distribution. To show this one has to prove the following 
statement: 

A necessary and sufficient condition for (he normality of the parent distribution 
is that (hr sampling distributions of the mean and of the variance be independent. 

That this condition is necessary follows from the above mentioned property 
of the normal distribution; so there is only to prove that this condition is suffi¬ 
cient, This was first proved by It. O. Geary 1 by using some of R. A. Fisher's 
general formulae for the wminvariants. However, a different proof, using 
characteristic functions might be of some interest. 

2. lad fix) la* the density function of a continuous probability distribution 

and let xt , jr 3 , • • ■ , .r„ be. n observations of the variate x. Denote by 

«* 

f 22 JV’n the sample mean, and by 

gwl 

n n n —1 n—l 

h 1 as Z — £) l /n — [(n — 1) Z ~ 2 Z Z x^p + i]/n 

«-i i “- 1 

the sample variance 1 of those observations. The characteristic function of the 
distribution is then given by 

(1) fit) - / e“70) dx. 

The characteristic function of the joint distribution of the statistics £ and s 
is known to l>e 

(2) fill, U) «* j • > * / e UlMI ' ,1 f(x l ) ■ > * JM dxi • • • dz n . 

In the same way one obtains the characteristic function of the mean £ as 
(2a) vu(b) — > 0) ** J ’ • • J e itll f(xi) ■ ■ • /(%*) d%i ‘ ’' 


t r, c. Geary, "Distribution of Student’s ratio for nonnormal samples,” Roy. Slat. 
Soc. Jour,, Supp. Vol. 3, no. 2. 



92 


Kl'<*ESr, U"h V * 


and the characteristic function of tl»<* di-tnlKjJi"!* *4 Me *. inwtee ' 

( 2 b) M == y( 0 , h) - / * ’ • / ""'P* i'" !**• > ,lf -‘ • ■ • 

The independence cif the distributions of i and V w'sk*- iu > f ‘h* «Jiar* 

aeteristic functions yfh , (-») *» yVhjStrjlbh »<r 

o) M 'v w ‘ . 

til 2 'Ii-e 

Substituting in ( 2 a) I =* 22 x„ j it, it is s*-en 

w(fi) * n / e ,,,If ' ? 77«) dx„ - [/ H,s 77)rfjpJ • 7ih, u r. 


therefore 


difi i in i m’i 

rr i ® Wti/iin v, 

Ov% Mil 


Differentiating (2) with resjK'cl to 


- i J • • • / * ■ * /7J dj, • • ■ dx„ 

*•# 

Substituting a 1 « ((n - 1) £*1 ~ 2££v«*il/n ;l anti 2 « we 

a 

obtain easily 

aj ^T' ) | I -« " / *V"*‘ 7 <x) dr 


WbAOr 1 / **“"' 7 U) dx 


In a similar way it is seen 


(4b) i « ‘I’LrJJ »*. 

!(,-« n 

Here a denotes the population variance of the parent distribution. Substi¬ 
tuting (4a) and (4b) in the relation (T) and writing i «*> <|/n one has 

( 5 ) iKO / *V te /(*) dx — J^J xe u *f(x) dxj « (^(OfV. 

Considering the definition (1) of the characteristic function it is seen that 


d ~JT * I*' e<tI K x ) «**• 



CHARACTERIZATION OF NORMAL DISTRIBUTION 


93 


The integrals on the left sides of relation (5) are of this form. So one may write 
the relation expressing statistical independence of the sample mean and the 
sample variance as a differential equation for the characteristic function 
Mt), namely 

<?) 

The initial conditions to be satisfied are 

(7a) HO) « I, \f/'( 0) = ig, 

where m is the population mean of the parent distribution. Integrating this 
equation it is seen that the characteristic function is 

(8) Ht) - a v, r , ' 1 ‘\ 

which is the characteristic function of the normal distribution. 


3, This reasoning applies also to the multivariate case, Let/fo, %i , • ■ •, x t ) 
be the density of the, p variates x t) xt, • ,x p . Denote by x ia (k - 1, 
2, * * • , p; a » 1, 2, < * ■ , n) the a-th observation on the k -th variate, by £* 
the sample mean of this variate and by «n the sample covariance between the 
it-th and 1-th variates. Assuming that the distribution of s*j is independent of 
the joint distribution of the p sample means (Si, St, < • , S p ) one obtains the 
equation 


(9) 


tlm Mm _ 

- jj - =■ -trim 

+ r 


Here <r tK , is the population covariance of the variates Xi and x m , 


f • * * ip) 05 / * • * / Vp ’M. • < •, x P ) dxx • • - dx p , 


denotes the characteristic function of the parent distribution and 


h 


dti’ 


fi 


dV 

dtidt m ‘ 


If (9) holds for 1, m 1, 2, • • • , p one ha3 a system of partial differential equa¬ 
tions which leads to the characteristic function of the multivariate normal 
distribution, 



94 


CAW/OB K. 


NOTE ON A METHOD OF SAMPLING 
By C’abi.or K, Dthiu.Kt'.ur 
National Vnhnnly of Literal, Arq<n{onn, 

Olds 1 has considered the following problem; (iv.'n <s hd *./ w rn • g * r 
containing s items of a s\ierifwl kiwi, Items art tlrntrn with<uf rrphirrrnr-rit until 
j of the s items hair been drawn. The problem is to determine the pnbahihiy fate 
of 7i, (he number vf drawings which hair (<> be metdr In the pro* nf n«li\ w«« 4mli 
consider a certain limiting form for the probability function of >i make 
some remarks concerning repeated campling of tbi** tyjw*. 

If n is the size of a drawing j < n < r 4-jits probability Jaw f’n ' i- given by: 

Pin) = ^p C 'd.i « i^ s ±4i (\„ t f ,-«<j ~~ 

Cm.n n r(j)l’(« - J + 1) -k 

The characteristic function of a is 





r{t -F I) 

r(j)r(a - j + 1) 



- rHl 


x 4 re' f tlx. 


Differentiating we find 


( 1 ) 

and hence 


f'jt, n) 
¥>(<, n) 


« ; + re 1 


f x\ i - xr’d - x -i- xc i y * tic 

Jo _ 

f X>-\1 - xr(1 - X 4- xrTrfx 

JjJ 


m iM = P(n)n « [*>'(/, «)),»<> =* j/ILjtJ, 

8 T 1 

For the calculation of moments about the mean we lake 
(2) p(t, n - mi) = c* m, V(f, n), 

from which we obtain 


[<P <W (b n — «= 2^ P{n)(n ~ m t ) k » pi&n). 

In particular, g 2 - j) _ The values of wti(n) and g s (n) have 

already been given by Olds using another method. Putting r; m ft, we have 

s + 1 


‘E. G Olds, AnnaU of Math, Slat,, Vol. 11 (1040), p, 356, 



SAMPLING 


95 


Ms 


Ms 


0(1 ~ 0) + 


(s + 1, 2) 


3(1 - 0)ms + 0(2 - 30 + 0 5 ) + 


r w (i 4) 

M< » (0 - 40)m. - (U + 40 + 50 8 )mj “ 0(6 - H0 + 60 5 + 0 ! ) + . 




r(r - 1) * • • (r - fe + 1), (j, *) = i(i + D • ■ • (j + ft “ l)- 2 


whore r 

We ran obtain a limiting form for P(n) in the following way: 
Since 




we find 


Therefore 

(3) 

where 


(, » ~ A * r(s +JL. f 2 ^(1 - *)' W (1 - * + xe tlr ) r dx. 

V r J r(j)r(a - ] + 1) Jo 


lim p (t, ~ -I} - J g UxV' dx, 


r(« + i)_~ *)■-'. 


LW f(7)r‘(s -T+ i) 


n - j. 


; P(«)> has as its 


1 V' 


The interpretation of (3) is that the distribution 
limiting form the distribution Is, -M 2 )! as r —► <*• 

lotting ni, th, • ■ • n w be a sample of size w and fl the mean, n “-£>*• ■ 

For the characteristic function of n we have 

„<(, *) - % ft iw” - g»(i.»)=[«■ (s-")] 

and hence 

» w !_(v 

•pit, a) p(~,n) 

^^T^olical method of calculation cf. 0. Dieulefait, Comptes Rendu, Vol. 
208, p. 146. 


! Jl-I/W 



96 


cam. on i.. ptf.vu,r\t ; 


For f = 0 wo have rthin) *» mi(n). Rut: 


d a ?\i, A) 1 

sail 

fit ?(t, A) 


Then for t ~ 0 wo arrive at 


For a » 1, we have 




if ft) 


nf *- v ?. n) 
,'fi" v-U. ni. 


P».jfH t 
ir“ 






w 


and this leads us to 

Irj(m + 1)|a 4- l ~ j) 
to{j? + 1) 5 (« 4 
By the Tchebychcff theorem wo obtain 

l 


ffa * 4/ r 


P{\ A - mi(n) | < fcs) > 1 


P‘ 


We can take l and w as large as we please; then we have the f«»lhm»iiK *bK’hiu*tic 
limit 


lim A ®> Wi(n). 


Now, we have 


and 


«o-4 -.. 

1. TfU n)l 

v* L V’ff) n) 


*m( 0 _ __mi 
*.(0 <rs ] 


Remembering (1) we readily obtain 

,, 4L f_ £ . Olt i)iO' ±_D , i 1, 

*"(0 a (S + 1) 1+ (a + l_Hs+ 2) + r+ 1J + 

T ^W! + 1 T 


l Hb 


Thus, we find 


1 + H 

+ 1 


lim *10) , 

1st 5® ■ '• 



COKHKI.A.TION DUE TO COMMON ELEMENTS 


97 


This result implies that the distribution \ ~- l i^; P(n) \ has the limiting 


normal distribution 




0'S 


as u> 


A SEQUENCE OF DISCRETE VARIABLES EXHIBITING CORRELATION 
DUE TO COMMON ELEMENTS 

By Carl H. Fischer 

I 'nivcrsily of Michigan 

L Introduction. Studies of correlation due to common elements have been 
made more nr less sporadically over the past thirty years in attempts to throw 
more light on flu 1 meaning of correlation. Numerous examples may be cited. 
One of the earliest wits a study by Kapteyn [1] in which he showed that two 
muiis, each of n elements drawn from a normal population with k elements in 
common, had a correlation coefficient of kjn. This was considerably generalized 
by the writer |3] who considered sums of different numbers of elements drawn 
from quite arbitrary continuous distributions. The work was extended to in¬ 
clude M'C|Uerirea of three or more such sums. Antedating this latter paper, 
Rietii (2| has devised various urn schemata in one, of which pairs of drawings of s 
halls each were produced with t balls hold in common. The coefficient of 
correlation between the numbers of white balls in each of the pairs of drawings 
was found to bo f/s. 

Fairly recently some interest has been shown in this subject in connection 
with the study of heredity; hence it appeared that it might be of value to present 
the 1 following stud}- by elementary methods of a scrpience of discrete variables 
in which each member is linked to the adjacent members by various specified 
numbers of common elements. 

2. Two variables. A pair of discrete variables is defined as follows - The 
first, .r, is equal to the number of white halls in a set of Si balls drawn one at a 
time from an urn which is so maintained that the probability of drawing a white 
hall is always a constant, p. The second, y, is equal to the number of white 
halls in a second set of balls formed by drawing ( n balls at random from the Si 
balls of the first set plus « a — tn balls drawn directly from the urn. The numbers 

and .<0 may or may not he equal. 

Evidently the marginal distribution of x follows the Bernoulli law and is given 
by f/ M ' r p*. 1 The first step in finding P(x, y:tn), the bivariate distribution 

1 By is meant the number of combinations of o items taken b at a time. It shall be 
understood that ^^ — Oif6<Oor6>o, 



98 


caul h. nrnr.ii 


function of x and y with halls in common between tin* two drawing, is to 
write the product of the three probabilities: of obtaining x white ball* in the 
first set; of drawing d of these whites in the hi hails rhueen at random from this 
set; of drawing exactly y ~ d white bulls among the «j -■ hi ball« drawn dim-fly 
from the urn to complete the second set. This product may readily }** reduced 
to the form shown below in (1), symmetric in x and y and in m s and «*, which 
is then summed on d from 0 to tu - Thus 


a) ^,=w - g z sX-X»=-V 


The marginal distribution of x has already lx*en given. From the symmetry 
of (1) it is obvious that the corresjxmding marginal distribution of y must Ik* 


characterized by the Bernoulli distribution function 




*p\ 


The variances 


of the marginal distributions are sipq and fbpq, respectively. 

We next proceed to demonstrate that Ixith of the regression curves are linear 
and to find the equations of the lines. Consider an array of x on y for some 
fixed value of y. The mean of the array is 


( 2 ) 





2l) xp(x, Itu»). 


The summation in the right member of (2) may Ik* expanded and then re¬ 
written as 




^*1 < 1 * i*4pt 


The inner summation in (3) is seen to equal d + p(n, - <„j and hence (2) 
becomes 


(:)> s 6 -.Tr-?)+«*■-«(?)] 


Then the equation of the line of regression of x cm y becomes 

“ InV/st + p(», — In). 


By symmetry, the line of regression of y on x may 1m seen to lx* 


H* “ W«i + p(»» - In)- 

The square of the correlation coefficient is equal to the product of the slopes 
of the two regression lines, hence 

^ ?*v ~ Ut/(*ih) { . 

If Si = aj = s we have the familiar result t/s. 



correlation due to common elements 


99 


3. Three variables, A third variable, z, may now be defined as the number 
of white hulls in a set of Sj balls formed by drawing 4a balls at random from the 
sj cif flir* m wow! set plus s, — 4* drawn directly from the urn. It is evident from 
the results on two variables that the marginal distribution of z follows the 
Bernoulli law and that the equations of the regression lines of z on y and y 
un z are 


2* - tny/Si + P(s> ~ tn)‘, 


lit — f»z/sj 1" p(s* — 4a). 

'I'lie correlation coefficient, r vt , is equal to <m/(s 2 S3) ! - 

The relationship between x and z remains to be investigated. The proba¬ 
bility of the joint occurrence of x whites on the First, drawing and z whites on 
the third when it is specified that the Si and s a balls of the two sets shall include 
the same g balls in common is given by the right member of (1) with g, z, and sa 
replacing U: , y, and « s , respectively. When this expression is multiplied by 
the probability that the first and third sets do contain exactly g balls in common 
arid tin* product is summed on g over the range. 0 to t a , we have P(x, z:f« , 4a), 
the bivariate distribution function of x and z. Thus 


(«) 


■e,('*)(?»_';)(*;) «**=»>• 


The mean of the array of a: and z for any fixed z may be written, after inverting 
the order of summation: 


(?) x, x 


t {[(:■)■%-- P - ■ t wo, *^Xr- 


The. expression within the square, brackets of (7) is identical in form with 
right member of (2), and hence we now have 


§ {w **+^ - «»(';)(:, -«)(’») 


the 


This reduces readily to 

( 8 ) ** 


hi hi _ , SiS, — „ 

. ... z -j- ---p, 

8 iS» 


By symmetry, 


hi In ^ ®S ®3_ 

Si Si 8J 


hitn 


The coefficient of correlation between x and z is found to be 


( 9 ) 



r» 


hi In 

siCsTsa)*' 



100 


CAM* tt. KtM'til'R 


It will lx; observed that 


(10) ®"s» “ • 

Interesting; relationship* also exi'd among tfs«- paitod and tnobiple <-nfT«lat«in 
coefficients and the irmltiplt* rcgre^em •*urfrc* 1 1 ? sit 1* e.r.-.fttr to here to 
measure eaeli variate from it* mean and <«- r»j>ln<e "h. ■npm v „ nni } z , 
cm r by 1, 2, and 3, respectively. Then thr* tnultspl* r<■%.-<'’n><u -'rle*- of each 
variable on the other two ninv!*• eoj„\i'nieufly <*xpr< ■■->•4 ut torn* >4 1 h* ««>faetor« 
of the correlation determinant From the rmiU* found Ft *!»»■ tuitc for 
the case where each element r,, of the correlation detoriunniP Jm»v la* e\j»rr- >*d 


the product r<,,-+i>r l+ i ,** * 


. we now hovi* 


Rn « 1 ~ 

ji 

* 

R \s « ■■ r ; .; 1 

*-!>■, 

Rn ~ 1 —• 

3 

ru , 

Rn - - r«'l 

r?,h 

Ru - l - 

3 

C|3 . 

Hit 0 


Then the regression plain’s 

of .r on 

if and •: atid of 

z on x and w w given 


respectively, by 


i 


rtjcri 


SV 


( \1 


!/. 


_ 03 1st 

2 a ‘ „ !/ " 4 , U 

«5 


The regression plane of y cm x and z is 

lV«(I — rlj) 


h ~ risTw 


\ 


Cl 


X + 


r«U * ru> 


«s 


(sjSs Si tjiJtu ^ ^ (*|$| itjfjsdJy 
S)Sj 83 fiSjgjSj 


The three multiple correlation coefficients are 


(11) Vi m = r n , r 3 .ij = r Mj ri « 
The partial correlation coefficients an: 

(12) r K , a ~ r u T - 1 —*’ J5 5 , r M i «= r M 

Ll ~ rnrsjJ 


1 — C1 -- r?* if l -* r|jt 


1 


1 t 

r )3 r,* 


* _ J p 

l “** n* 

I ' % 1 * 

.1 ~ Jjsr«J 


rui "" d 


4. k variables* A sequence of k variables may lx* formed Nueecswivolv ua were 
the three considered above, It will be. convenient here to designate the variables, 
by Xi (i = 1, 2, ■ • ■ k). We also define h< as the total mimlmr of balls held m 
common between the first and the vth drawings. Then, ns special easea, 
hi = si and hi = < M , 



COHRKL.YTION DUE TO COMMON ELEMENTS 


101 


The bivariate distribution functions, regression lines, and correlation coeffi¬ 
cients associated with any two consecutive variables in the sequence and with 
any two variables separated hy only one other variable can, from the preceding 
results, la* written at onre. 

If is nut difficult to derive the bivariate distribution function for Xi and x* 
In* an extension of the method used in deriving (C). We, then have 


f*U \, x k :fn, <« ■ •• 4-u) 

The equation of the line of regression of Xi on x k is 


Xi « X xi / J (x L , x k :ln, fu • • • 4-i,0- 
* 1-0 

This may Ih* reduced, hy repeated applications of the steps illustrated in the 
eorresjKtnrling ease for three variable, to the form 


CM) 


Xl 


3**3 


tafia ' 

*3*3 


4-u- ^ Si $2 • • • 8t-i — tub a * • • 4-n 

■ Si *1 Sj • • • 8 k - 


Hy symmetry, wt* lmve 


Xu ™ 


4*4j * •' 4~i,i- _ , s**a * •' *i 42 fw 4-i.i >„ 

•4 -j- -- --p. 

*i*2 • • ■ *t~i *a*a ’ * ’ s *-i 


Then the simple correlation coefficient between Xi and x k is 


US) 


f«(*i ' ‘' 

*3*3 ■ • • *i-i(siSi) ,/2 


ru-r23 ■ • * 4-1.1. 


It was shown by the writer [4] that for a sequence such as we are considering 
the multiple correlation coefficient is a function only of the variables immedi¬ 
ately adjacent to the one considered, and that the partial correlation coefficient 
is zero for any pairs except those of consecutive variables in the sequence. Thus, 
the formulas given in terms of simple correlation coefficients for the case of a 
sequence of three variables may Ik; interpreted so as to cover the case for k 
variables. 


REFERENCES 

HI J. ('. Kafteyh, "Definition of the correlation-coefficient," Monthly Notices Roy. Aslron 
Hoc., Vol. 72(1012), pp. 518-525. . 

[21 H L. Rum, “Urn schemata as a basis for the development of correlation theory, 
Annals of Math. Vol. 21(1920), pp. 306-322. 

(31 C. H Fischer, “On correlation surfaces of sums with a certain number of random 
elements in common," Annals of Math. Stal. Vol. 4(1933), pp. 103-126. 

(41 C. H. Fischer, “On multiple and partial correlation coefficients of a certain sequence 
of sums, 11 Annals of Math. Slat. Vol, 4(1933), pp. 278-284. 



REPORT OF THE NEW YORK MEETING OF TEE INSTITUTE 

The Seventh Annual Me < iin« -f the Itutifuf* <4 *3 At*»ti*V’> va* 

held from Saturday to Tu^day, iJemnt^r 27 'Ml PHI, m r.*njmmfmn with 
the meetings of the Allied Social Srienre With fV <-\nr>trimn of 

the session on Tuesday afternoon, all tmwdnni* wot** IHd . 1 ? Em HiUsimn* Hotel 
The following one hundred seventy*M*vert ini inls-rs* of th<* Institute atlrnd^l 
the meeting; 

F. L. Alt, H, E. Arnold, K. J. Arnold, h A Ar**t»«t, K J \n»*. H W Itv-M»f, I I, 
Battin, B. M. Bennett, Carl Itrnnrtl. Joseph Ik-rk-wn, f>Jm fcrfj.auun, t I Wwh*', t' 

I, Bliss, A. J. Bonis, Paul Btwchan, A. H Bnvikcr, II H tij&ity, A h Bran It, 1*. H Brawn, 

II. W Burgess, J. H. Bualicy, Belle Calderon, B It ,1 M i'!m«ohAV •« D vhraii, 

A.C C'ohen,Jr.,M.8.Cohen,IwdnreC.din,J B iVbrozn.L M C,«*?!.!» H <t 
Gertrude Cox, G. C. Craig, B. B. Day, I). B Jkd.ury. VV I, 1 kisuns;• J !)i»r>n. It I 
Dodge, H. F. Dam, Paul Ilowilrr, David Durand. J H fhtlin. I* S Dwyer, t'hi«rrhiH 
Eisenliart, W. F. Elkin, J. 8. Elston, M. I.. KWrhwk, I) H Iiatwdy, S» I vans,, W»I|v 
Feller, J. W. Fertig, Irving Fisher, \V. C. Fl.-ih.-riv. M M Fl-e-d, It At I- It 

Frankel, II. A. Freeman, (5. R. Clause, Hilda lieinnger, r* It Graves, J \ Greer. 

J. I. Gridin, G.C Grove, F. E. Grubbs, K. J. GuniM, M J IJ»g.--4,H 3 l!»:.d.M II 
Hansen, Myron Hcidingatteld, Edward ltelty ( U At Hopper, Itsrrdd Haullm* I A H -v, 
WilliamIUirwiU, Seymour Jablon,W. \¥. Jaeului, Itaehel JL-ttw, Mvr<-n KAnWuuls, Hurl 
Karston, Leo Katz, C. J.Kicrnan, B. F. Kimball, A J King, 1. F Knudern, II 8 K<*r»jn. 
TjnllingKoopmanB.R, L. Kotelka, A.K. Kurts, A U Knry.S At Kweml, J«eh 1 aderman, 
Oscar Lange, D. H, Leavens, B. A. Lcttgycl, Howard Lrvme, Ida t^-vin. M 3 Irving 
Lorgc, A. J.Lotka,EugeneLulmca,G. A. LundlH*rg,P.J. McCarthy, \V U M>v!<-*.It«ujanim 
Malzborg,Ilenry Mann, Jakob Mandiak, J. W. Maurhly.ti, V. T M*v«-r,Margaret Merrell. 
J. N, Michio.J.R, Miner, Nathan Morrison, J. E. Morion. F C MnsUdbr, M H Xtufeld, 
HaroldNbselson.G.E, NiYer, M L. Norilnn, Nil&n Norris, J, I NurUmm, U Oakley, 
E. G. Olds, P. 8, Olmstcad, J. G. Osborne, R. F. Pruwann, Edward I'aulrwri, K Payne, 
Viotor Perlo, J.M. Porotti, L. M. Petit, G. A. D. Preinrneh, Harry Prr«. Little Ratkuwitjt, 
L, J. Ileed.F. V. Reno, J. 8. Rip&ndelii, Selby Robinson, 11 (I. Hwnt, A C R-»amI.r, 
Ernest Rubin, H A. Ruger, P, A. Sarautdson, M. M. Random! re, Max Randy, ¥ K Baller- 
thwaite, Henry Sohoffo, H. L, Sohug, H, A, Sacrist, Nathan Beider*, W, A Shelton, R. W. 
Shephard, W. A Shewhart;!!. M. Shulman, Harry Siller, R. R. Singleton, I. E. Smart. J. 
H.Smith,G.W.Snedecor,Emma Spaney, Mortimer8piegelm«n,ArthurSirtfi,M 8,Stevens, 
J. S. Stock, M. M. Torrey, M. N. Torrey, W. R. Van Voorhis, D. F. VnUw, Jr , W. C. 
Waite, H. M. Walker, W. A. Wallis, A. N. Watson, E, W. Wilson, V. V Wiwmr, Jacob 
Wolfowitz, M. A Woodbury, W. J, Youden, Joseph Zubin. 

The opening session on Saturday afternoon on The Rate of Teg is of Siynifiennce 
in Biological Research was held jointly with the Biometries Section of the 
American Statistical Association. Professor K, R, Wilson of the Harvard 
School of Public Health acted as chairman. The (tension watt in the form of a 
round tabic discussion, the principal discussants being: W. Edward® Doming, 
Bureau of the Census; Harold Hotelling, Columbia University; Lowell J. Reed, 
Johns Hopkins University; and George W, Snedecor, Iowa State College. 


* The Bst of attendance has been compiled from the registration list supplied by the 
Director of the New York Convention and Visitors Bureau. 


102 



REF OUT OF NEW YORK MEETING 


103 


(In Saturday overling, under the chairmanship of Dr. Walter A. Shewhart of 
Bell Telephone Laboratories, a session was held jointly with the Econometric 
Society on Theory of Runs and Confidence Intervals. The following program was 
presented: 

1. The theory of rum in random data. 

Harold T. Davis, Northwestern University. 

2. file lime series significance teshi based an signt of differences. 

UpofTrry It Moure, Rutgers University. 

V. Allen Wallis, .Stanford University. 

3. Conference, inlrrt ah for the unknown median of any type of universe. 

John It. Smith, University of Chicago. 

The morning and afternoon sessions on Sunday on Numerical Computational 
Devices were held jointly with the American Statistical Association, with the co¬ 
operation of the Committee on Addresses in Applied Mathematics of the 
American Mathematical Society. Dr. C. II. Langmuir of the Carnegie Founda¬ 
tion for the. Advancement of Teaching acted as chairman of the morning session 
on Statistical and-Matrix Calculation. The following papers were presented: 

1. Home matrix methods in least square and other multivariate problems. 

Harold Hotelling, Columbia University. 

2. The Mattock electrical calculating machine for solving simultaneous linear equations. 

Elisabeth Monroe Boggs, Cornell University. 

3. Mathematical operations with punched cards. 

J. C!. McPherson, International Business Machines Corporation. 

4. Recent developments in correlation technique. 

Paul 8. Dwyer, University of Michigan. 

The. subject of the afternoon session was Mechanical Solution of Differential 
Equations. Dr, R. M. Foster of the Bell Telephone Laboratories presided for 
the following program: 

1. Punch card calculation of orbits. 

W, J. Eckert, Naval Observatory. 

2. Punch card methods for solving linear differential equations of second order. 

Martin SchwarzBchild, Columbia University. 

3. Ihiffercntial analyzers. 

Harold L. Ifazen, Massachusetts Institute of Technology. 

Discussants; 

L. 8, Detloriek, Aberdeen Proving Ground. 

Norbert Wiener, Massachusetts Institute of Technology. 

Professor Helen Walker of Columbia University held the chair at the Sunday 
evening session, a joint session with the American Statistical Association, i e 
following program was given under the title: On Some Technical Aspects of 

Sampling. 

1. On the relative efficiencies of various areal sampling anils in population inquiries. 

M. H, Hansen, Bureau of the Census. 

William Hurwits, Bureau of the CensuB. 



104 


RKPOHT OF NKW YORK MEETIWl 


2, On the monthly sample surer}/ of uncmplnymcnt. 

L. R, Frenkel, Work Projects Administration. 

J. S, Stock, Work Projects Administration. 

3, On certain biases in surveys by r/ueslinnnairr 
J. Cornfield, Bureau of Labor Stsitisticn 

i. On the relation of probability to sampling. 

W. G. Matlow, Bureau of the Census. 

6, Recent developments in sampling for agricnUu rat statistic**' 

G. W. Snedecor, Iowa State College, 

A. J, King, Iowa State, College. 

Discussants: 

W. G, Cochran, Iowa State College*, 

J. A. Greenwood, Duke University, 

Another joint session with the American .Statistical Association was held nn 
Monday morning. The topic considered was: TPAnf Can the (Vnuun Do With 
Sampling? Professor L. Edwin Smart of Chin State Cmversity presided for the 
following program: 

1. An appraisal of the 1D1/0 sampling sthenic. 

T. O. Yntcma ond Dickson H. Leavens, Cowles Commission for Research in 
Economics. 

2. Some requirements of sampling design and presentation. 

W. Edwards Dewing, Bureau or the Census 

3. Compromises, losses, and gains brought about by the. introduction of sampling. 

L. E. Trueadcll, Bureau of the Census. 

4 The proposed annual sample census, 

Philip M. Hauser, Bureau of the Census. 

Discussants: 

A. N Watson, Curtis Publishing Company. 

F. F. Stephan, Office of Production Management, 

S. A. Stouffer, University of Chicago. 

On Monday afternoon, a session was held for the reading of contributed 
papers on Probability and Statistics. Professor Harold Hotelling acted as 
chairman, and the following papers were read: 

1. Scanning data to determine significance of difference between frequency of an rrrnt.in 
contrasted groups. 

Joseph Zubin, New York State Psychiatric Institute. 

2. Compounding probabilities from independent significance tests. 

W Allen Wallis, Stanford University. 

3. A class of multivariate distributions. 

Walter Jacobs, Securities and Exchange Commission, 

4. Definition of the probable error. 

E. J. Gumbel, New School for Social Research. 

5. A generalized analysis of variance. 

F. E. Satterthwaitc, University of Iowa. 

8. On the power function of the analysis of variance lest, 

Abramham Wald, Columbia University. 

7 DncHonI C ° mpUlin9 lht Tooi * of oulic and ^ aHic equations by hyperbolic and circular 
E, E Blanche, Michigan State College. 



REPORT OR NEW YORK MEETING 


105 


8 Additive partition functions. 

J. Wolfowitz, Columbia University 

9. Limited type of probability distribution applied to flood flows (Preliminary report) 

B. F. Kimball, New York State Public Service Commission 
Abstracts of these papers follow this report. 

Professor Harold Hotelling acted as chairman for the session on Tuesday 
morning, held jointly with the Econometric Society and the American Statistical 
Association. The program consisted of invited addresses on Recent Advances in 
Mathematical Statistics by Professors Burton H Camp of Wesleyan University 
and Cecil C. Craig of the University of Michigan. 

The session on Tuesday afternoon was held at The Boyce Thompson Institute, 
Yonkers, New York. It was a joint session with the Biometrics Section of the 
American Statistical Association on The Design of Experiments. Dr. W. J. 
Youden of The Boyce Thompson Institute acted as chairman and had various 
experimental designs on display in the greenhouse. Through the courtesy of 
members of the Institute stall, transportation between the railroad station and 
the Institute was provided. After the program, lea was served, The following 
papers were read: 

1. Biological interpretation of interactions. 

W C, Jacobs, Cornell University 

2 Adapting the design to the experiment 

Gertrude M. Cox, North Carolina State College. 

3. Sampling theory when the sampling units are of unequal size 
W. G. Cochran, Iowa State College. 

4. Sampling errors of systematic and random surveys of cover type areas. 

J. G. Osborne, U S. Forest Service, 

A luncheon meeting Monday noon was held jointly with the Econometric 
Society and was attended by ninety-four persons. Professor W. C. Mitchell 
of Columbia University presided and called on Irving Fisher, Harold Hotelling, 
W. G. Cochian, and W. A. Wallis for brief remarks 

The annual business meeting of the Institute was held late Monday after¬ 
noon, with President Hotelling presiding. 

The report of the Secretary-Treasurer was read. The report appears on 
pp. 107-109. 

President Hotelling stated that Mr. George W. Petrie, III, had audited the 
books and records of the Treasurer and found them to be in agreement with 
the Report presented. 

Dr Madow, who acted as teller, reported that the mail balloting had resulted 
in the election of the following officers for 1942: 

President' Professor C. C Craig 

Vice-Presidents: Professor A. T Craig 
Mr. E. C Molina 

Secretary-Treasurer: Professor E G Olds 



100 


REPORT Of DA1.I.AH MEET!SO 


After discussing various ways of broadening; the service of the Institute, a 
motion was carried which recommended that the Hoard of Directors appoint 
committees to study the following matters: junior memberships, local chapters, 
and advertising for the, official journal. Later the Hoard approved this recom¬ 
mendation and committees were appointed. 

Edwin t*. Owm, 

Secretary 


REPORT OF THE DALLAS MEETING OF THE INSTITUTE 

The twelfth meeting of the Institute was held jointly with the meetings of 
Section A of the American Association for the Advancement tif Science and of 
the Econometric Society in Dallas on December 2d 30, 1041. Probwor 
Dunham Jackson, Secretary of .Section A of the A. A. A. 8., has kindly sent 
the following information regarding the meeting: 

Sessions of the joint meeting of the Institute of Mathematical .Statistics 
with the Econometric Society and Section A of the A. A. A. S. were held Monday 
afternoon, December 29, and Tuesday morning and afternoon, Domnlter 30, 
at Southern Methodist University. The, number of contributed papers offered 
on Tuesday was such as to cause extension of the session into the afternoon. 

On Monday afternoon addresses were delivered, in accordance with the 
programs issued in advance, by Professor A. B. Coble of the University of 
Illinois, retiring Vice President for the Section, on A Certain Sri of Ten Point s 
in Space , and Professor S, S. Wilks of Princeton University on hepresenta- 
tive Sampling, 

The order of papers on Tuesday was as follows: 

1. On the theory of the tetrahedron. 

N. A. Court, University of Oklahoma, 

2. A method for integrating the linear hyperbolic equation in three independent variable*. 
E. W. Titt, University of Texas, 

3. On powers of a matrix who so elements are eel* of points, 

S, T. Sanders, Jr., Southwestern Louisiana Institute. 

4. Analytic theory of parametric linear partial differential equations. 

W. J, Trjitainsky, University of Illinois. 

5. The theory of the Riesz integral. 

H, J, Ettlingor, University of Texas. 

8. Obtaining differences from tables which are in the form of punched cards, 

Harry Pelle Hartkomcior, University of Missouri. 

1. On investment and the uafualion of capital, 

Montgomery D. Anderson, University of Florida, 

8. Advantages of singling out degrees of freedom in analyses of variances. 

W, D, Baton, Michigan State College. 

9. The incidence of an income tax on taring, 

Abram Bergson, University of Texas. 

10. Certain tests for randomness applied to data grouped in small sets, 

Edward L. Dodd, University of Texas, 



REPORT OF SECRETARY -TREASURER 


107 


11. Stratified sampling (Preliminary Report). 

A. M, Mood, Univeraity of Texas. 

12. On convergence factors in convergent integrals. 

Charles N. Moore, University of Cincinnati. 

13. Geometric statement of a fundamental theorem for four-dimensional orthographic 
axonometry. 

W. H. Roever, Washington University. 

14. A certain non-metnc Moore space. 

F. B. Jones, University of Texas. 

Abstracts of papers 8,10, and 11 follow this report. 

Papers 1 to 8 inclusive on this list were presented Tuesday morning, and 
papers 9 to 14 at the afternoon session. In the absence of the authors, papers 
10 and 12 were read by title. 

The presiding officer Monday afternoon was Professor G. T. Whybum of 
the University of Virginia, Chairman of the Section and Vice President of the 
A. A. A. S. On Tuesday Professor H. J. Ettlinger of the University of Texas 
presided for papers 1 to 4 inclusive, and Professor S. S. Wilks of Princeton 
University for the rest of the program. 

Edwin G. Olds, 

Secretary 


ANNUAL REPORT OF THE SECRETARY-TREASURER OF THE 

INSTITUTE 

On September 2-4, the Institute met at the University of Chicago, in conjunc¬ 
tion with meetings of the American Mathematical Society, Mathematical 
Association of America, and Econometric Society. Sixty-eight members of the 
Institute attended the meeting. 

As mentioned in the 1940 report of the Secretary, the Institute became 
affiliated with the American Association for the Advancement of Science at the 
close of 1940. President Hotelling appointed Professor Truman L. Kelley as the 
representative of the Institute on the Executive Council of the A.A.A.S. for 1941. 

On December 29-30, 1941, the Institute held two joint sessions with Section A 
of the A.A.A.S. and the Econometric Society in connection with the Annual 
Meeting of the A.A.A.S. at Dallas, Texas. Professor Wilks gave an address 
at one of the sessions. The report of the Seventh Annual Meeting of the Insti¬ 
tute appears on pp. 102-106. 

The Institute was invited to send an official representative to the Academic 
Festival of the University of Chicago, September 27-29, 1941. Mr. John F. 
Kenney was appointed as the representative of the Institute. 

During the past year, the Secretary has received a number of inquiries from 
members regarding opportunities for doing statistical work in business, govern¬ 
ment, and industry. While the Institute has no particular organization for 
such service, the Secretary will be glad to supply information regarding positions 
which come to his attention. 



108 


IlttPOKT OK HKCIO.TAKT-TUr.AHl'HKU 


The Institute has printed an official attract blank to !«■ »i*«Ki in submitting 
abstracts for contributed papers. A supply of these blanks ran Kr obtainrrl by 
writing to the. Secretary. 

The deaths of two of the members of the Institute have been reported since 
the last Annual fleeting: Piofessor James W. (Hover, rnivor-ity of Michigan, 
and Mr. M C. Mac Loan, Dominion Bureau of Statistics, Ottawa. 

The following financial statement covers the period from January 1, 1941 to 
December 10, 1941: 


HI'X'KIPTK 

Balance, January 1, 1941. .. . $127 47 

Rockefeller Foundation Grant . . l.OtX! tX) 

Dues . . . U.lOfl Ml 

Subscriptions ... .... 1,15* ft? 

Sales of back numbers . Xttl 29 

Miscellaneous . , ... fit!, 93 


Total Receipts , 


Annals Office 

Editorial Expenses 1910 
1911 

Printing. 


EXPENDITURES 


Waverly PnEss 

Printing and Mailing Annals—4 issues 
Back numbers Office 

Postage and mailing 1910 .... 

1911. 

Insurance . . 

Purchase of back numbers from II. 0. Carver . 
Reprinting 200 copies of Vol V, No. 3. 

Membership Committf.e , . 

Secretary-Treasurer's Office 

Piling Case. 

Printing and Supplies . . 

Postage, telegram, and express. 

Clerical Help . 

Printing Programs for Meetings. 

Miscellaneous . 

Total Expenditures . 

Balance on hand, December 10, 1941. , 



$6,046 25 

In comparison with the financial condition of the Institute at the end of 1940, 
e receipts from dues, subscriptions, and sales of back numbers have increased 












REPORT OF SECRETARY-TREASURER 


109 


nearly two thousand dollars. This is largely due to a net increase of 171 mem¬ 
bers and 20 subscriptions. Early in the year the Institute received the last 
thousand dollars of its grant from the Rockefeller Foundation. This source of 
income has materially assisted the Institute in surviving a period of financial 
uncertainty. Its loss will be severely felt. 

The expenditures of the Institute show a slight decrease, partly due to the 
fact that fewer back issues of the Annals had to be reprinted. An unnecessarily 
large item of expense is that of the postage which has to be paid because of the 
slowness of some members and subscribers in paying dues tod reporting changes 
of address. Many copies of the Annals have to be reclaimed and mailed a 
second time, Members could save the Institute considerable expense if they 
would pay their dues promptly and report change of address well in advance of 
publication dates of the Annals 

Financial prospects for 1942 are mixed. The importance of the statistical 
approach to problems of national defense has caused increased interest in mathe¬ 
matical statistics with the result that many people employed in government 
service or industry are applying for membership and urging their libraries to 
subscribe to the Annals . On the other hand, delivery to, and collection from, 
foieign libraries is becoming increasingly difficult, and a marked decrease in 
the number of foreign subscriptions can be anticipated. Furthermore, operating 
expenses of the Institute are almost certain to increase as material and labor 
costs advance. On the whole, it seems very probable that it will require the 
full co-operation of all the members to avoid operation at a loss during the next 
calendar year 

Edwin G, Olds, 
Secretary-Treasurer. 

December'29,1941 



ABSTRACTS OF PAPERS 

I. Presented on December 27, 1941, at the New York Meeting of the Institute 

A Generalized Analysis of Variance. Fuanki.in K. Hattkkth tv a itb, Uni¬ 
versity of Iowa and Aetna Life. Inaurance Company. 

TLiis paper examines the fundamental principals underlying designs for the analysis 
of variance. Given several statistics of the type, t< m where the O'n are arbitrary 
orthogonahzcd linear ftftictiona of certain underlying normal data, n; a rule m net up for 
determining a set of i«» os linear functions of the x* such that « 35 (x» — no,* null be 
independent of the remaining xj’s, Further it ia shown that, sinuillaneously with the 
above, the x’s and the 0’s may be subjected to certain types of linear restrictions (for the 
purpose of estimating parameters or otherwise.) without disturbing the distributions or 
the independence relations except for the appropriate reduction in degrees of freedom. 
The rule used to determine the m’a gives results consistent with the standard designs for 
the analysis of variance. However, it goes further in that one may use weighted rather 
than simple averages in setting up his design. A practical application of this is the two 
way analysis of data which are averages and lack homogenicly of variance through con¬ 
stants of proportionality between the variances are known, The two way analysis of 
incomplete data is another practical problem which is solved by the simple expedient 
of a zero weight. The use of weighted averages frequently introduces difficulties in esti¬ 
mating parameters, particularly the mean. The combination of the linear restriction 
concept with standard analysis of variance methods solves this difficulty. 

On the Power Function of the Analysis of Variance Test. Abbaham Wald, 
Columbia University. 

It is known that the power function of the analysis of variance tost depends only on a 
singlo parameter, say X, where X is a certain function uf the parameters involved in the 
distribution of the sample obaorvatione. Lot Z be any critical region (subset, of the sample 
space) whose size does not depend on unknown parameters, i.o., it has the name aizo for 
all values of the parameters which are compatible with the hypothesis to be tested. It is 
shown that for any positive c the average power (a certain weighted integral of the power 
function) of the region Z over the surface X ** c cannot exceed the power of the analysis 
of variance teBt on the surface X *» e (the power of the latter tes.t is constant on the surface 
X = e), P. S Hsu’s result, Biometrika, January, 1941, pp, 02—68, follows from this as a 
corollary. 

Definition of the Probable Error. E. J. Gumbel, The New School for Social 
Research. 

The probable error is usually defined either os the semi-interquartile range or as 1 of 
the standard error, We define it as half of the smallest interval that has the probability 
Por distributions which never increase (decrease), the beginning (end) of this interval is 
the origin (the median), and the end is the median (the end of the distribution). In general 
the probable errorp ia the solution of the equations VK(f + p) — W(( - p) » J and w({ -|- p) 
“ ~ p) where (■ denotes the midpoint of the interval. For symmetrical distributions 

the firnt definition remains valid. For the Gaussian distribution the second definition 
holds besides. The numerical values for the midpoint I and the probable error p are given 
for some distributions usual in statistics. The calculation of the standard error of the 
probable error, which depends upon the distribution w(x), determines whether the probable 
error is more or less precise than the standard error. For the asymmetrical exponential 

110 



ABSTRACTS OF PAPERS 


111 


distribution the mean and the median have the same precision, and the probable error is 
more precise than the standard error. For the first law of Laplace, and for Gallon's re¬ 
duced distribution the median and the probable error are more precise than the mean 
and the standard error For Maxwell's distribution the mean and the probable error are 
more precise than the median and the standard error. 

A Class of Multivariate Distributions. Walter Jacobs, Security and Ex¬ 
change Commission, Washington. 

The multivariate normal distribution has the property that its probability density is 
constant along the surface of a hyper-ellipsoid The class of distributions characterized 
by this property is considered. The form of the characteristic function of any distri¬ 
bution of the class is determined; in this way the parameters of the distribution are Bhown 
to be simply related to the first and second moments, when these exist. 

Every distribution of the class is the n-variate extension of a univariate symmetrical 
distribution The method of determining the form of the extension of such a univariate 
distribution is given. A number of properties of regression for the multivariate norma) 
distribution are shown to hold for any distribution Cf the class. Among other properties 
considered is the form of some sampling distributions. Some special cases of interest, 
including the extensions of the Cauchy distribution and the median law, are discussed 
briefly. 

Methods for Scanning Data to Determine the Significance of the Difference 
Between the Frequency of an Event in Contrasted Groups. Joseph Zubin, 
N. Y. S. Psychiatric Institute, New York. 

In many investigations in Psychology, Sociology, Economics and Public Health, there 
is a need for a quick and ready method for scanning a mass of data in order to select the 
itemB that have a significant bearing on the problem under investigation, The statistical 
procedure for this item analysis consists essentially of evaluating the 2X2 tables which 
arise when two groups are contrasted for the presence and absence of a given character 
or event The chi square method or its equivalent, the ratio of the difference between 
per cents to its standard error, require considerable labor and time and several methods 
have been proposed for shortening the work. Recently a method was developed which 
eliminates the need for computing percentages or expected values, the analysis being made 
with the absolute frequencies. This method depends upon transforming p, the per cent, 
to the inverse sine function of The method is applicable not only to 2 X 2 tables but 

can bIbo be made applicable to 2 X n tables and r X n tables with the aid of simple formulae, 

Compounding Probabilities from Independent Significance Tests. W. Allen 
Wallis, Stanford University. 

For combining the probabilities obtained from N independent tests of significance into 
a single measure, tho product of the N independent probabilities provides a criterion which, 
though rarely ideal, is usually satisfactory. The probability that such a product will be 
Ibbs than Q always exceeds Q, and is tho sum of tho first N terms in a Poisson sorios whose 
parameter is — log.<2; since this sum is also the probability that a value of based on 2 N 
degrees of freedom will exceed —2 log,Q, existing tables of x l may (as R, A, Fisher has 
pointed out in Statistical Methods for Research Workers, section 21.1) be used to test the 
significance of a product of probabilities. If any of the probabilities have been derived 
from discontinuous distributions, as is likely with small samples of non-metric data, this 
method of calculating the probablity of the product fails; in such instances it invariably 
overstates the probability of the product. Formulas are given for various special cases 
arising frequently in practice and also for the general ease of D + C tests of which D are 



112 


ABSTRACTS OF PAVERS 


baaed on discontinuous distributions and C on continuous distribution*. In several il¬ 
lustrative examples, the overstatement of the joint probability eon sequent, upon neglect 
of discontinuities is of the order of 100 to 200 per cent. 

A Method of Computing the Roots of the General Cubic Equation with Real or 
Complex Coefficients. Ernest E. Blanche, Michigan State College. 

The general cubic equation with real or complex coefficients may readily tie reduced to 
the form y 3 + Aliy + 6 — 0. Suitable substitutions for y in the reduced equation jiermit 
the use of the identities for hyperbolic functions and circular functions: sin Hr, cos 3x, 
sinh 3*, cosh 3a; and sin (u -f m>). The following classifications may Imaet up: (Ai ff (1 < 0 
andH >0, only real root is y *» 2a/Y[ einh z whore sinhSc » (h'21I\/H •» ,V, ill-li If (r <0, 
H < 0, G/2Ha/—H £ 1, three real roots, obtained by use of circular identity, cos Ax, (li-2) 
If G < 0, H < 0, Q/2fl\/—Il > 1, only real root is y » 2 \/—H cosh i when* cosh •» 
Q/2H\f~H . Complex roots ore —} j/j ± hi. The general cubic with complex coefficients 
has solutions j/»+, — —2 y/II sin (u + 2nr/3 -f iu) for n »» 0, 1, 2, where sin i3ii f- .lie) «• 
o + hi ■> M. For M real, apocial cases are similar to (Aj, (Ii-1) and (11-2). 


Limited Type of Probability Distribution Applied to Flood Flows (Preliminary 
Report). Bradford F. Kimball, Port Washington, N\ Y. 

Relative to Gumbel’s recent paper on Flood Flows (K .1 .fluinbcl, "The return period 
of flood flows," Annals of Mathematical Slalitltct, V r oI. I2fl!Ml)| fhc author poinfsrnif (lint 
Gumbel’s argument that the probability distribution of maximum valuea d**r*s not stem 
from a limited form of primary probability distribution of the stream flow, is misleading 
(see page 177, loc. ci t.). One might argue for a primary probability distribution of at ream 
flows of the type: dV ® exp(— itPldu where u ■» — log(a — xj|, 0 jg x % a, where x is 

the measure of flow This increment of x is related to normal probability increment by 
the linear equation kdx «* (a - x)du. This distribution will not satisfy the condition that 
von Mises uses in his argument concerning a finite distribution since the cumulative dis¬ 
tribution V does not possess a positive derivative of finite order at x *® a Also, although x 
does not hove infinite range, the transformed variate u has an infinite range to the right, 
and will satisfy von Mises 1 argument for the derivation of the cumulative distribution uf 
the maxima, of the form exp[— expj— a(n — ud) in terms of «. The author finds that 
such a distribution more accurately describes the belmvuir of maximum annual flood (lows 
than one which ignores the existence of an upper limit a. 


Additive Partition Functions. J. Wolfowztz, New York City, 

Let ni and n, be positive integers and le t 


m =■ max 


( ni Pi \ 

\«1 + Uj ’ Tl| + 


Let the stoohastio variable V - (y lp y a , • ■ ■ y,) be any sequence of positive integers such that 
Ui + w, 4- y 8 -b ... i e equal to cither one of m and n,, while u % + v, + « t -p ... j« equal lt» 
the other. Two sequences V with the same elements arranged in different order are to he 
considered distinct and all sequences V are to be assigned the same probability. Such 
sequences are of statistical importance (Wald and WolfowiU, Anruifs of Math. Slat., Vol. 11 
(1940) Let f(x) be a function defined for all positive integral values of x which fulfills 
the following conditions: , 

1. There exists a pair of positive integers, o and l>, such that that 


/(a) 


a 



ABSTRACTS OF PAPERS 


113 


2. The series 

2 i/w \ mh 

1-1 

is convergent Then, as m and n t —» «, while ni/rii remains constant, the distribution of 
the stochastic variable 

F(V) . £/(»,) 

<-i 

approaches the normal distribution. When }{x) m 1, F(V) m U(V) (loc. cit., Theorem I). 
When/(z) •» log ^(7) is a statistic introduced by the author (Amer Math.Soc Bull. 
(1941), p. 216). 

A similar result holds for partitions of a single integer 

II. Presented on December 29, 1941, at the joint session of the Institute, 
The Econometric Society, and Section A of the A. A. A. S. 

Certain Tests for Randomness Applied to Data Grouped into Small Sets. 

Edward L. Dodd, University of Texas. 

G. Udny Yule, in his paper A Test of Tippett’s Random. Sampling Numbers (Roy. Slat. 
Soc. Jour ., Vol. 101(1938), pp. 167-172), described tests applied to certain sums of the 
Tippett numbers. Yule regarded the Tippett numbers as not altogether satisfactory. 

The tests now to be described, however, involve no summation, For sets of three 
digits, four classes may bo distinguished: The middle number may be the largest, or it 
may be the least; or the sequence may be monotone increasing or monotone decreasing 
—here the sequence o, a, a, may bo classified with the monotone increasing sequences when 
a > 4; otherwise, with the monotone decreasing sequences Similarly, six consecutive 
digits in two sets of three digits each give rise to sixteen classes. On the basis of range, 
sets of two or more of the digits 0, 1, 2- ■ • , 9 may be separated into ten classes. 

Chi-square tests applied by the present author on the basis of the foregoing and similar 
classifications have not thus far indicated that the Tippett numbers are not satisfactorily 
random. 

Stratified Sampling. A. M. Mood, University of Texas. 

When certain relations between the probabilities p\, pi, ■ ■ , p fc of a multinomial popula¬ 
tion are known m advance, the technique of stratified sampling provides more efficient 
estimates of the probabilities than does random sampling. Under certain conditions 
of stratified sampling, however, the maximum likelihood estimates, n,/n, of p, are biased 
but are unbiased m the limit as the Bample size increases. The methods and reBultB of the 
theory of maximum likelihood require no modification to be made applicable to the problom 
of estimation in stratified sampling; in fact the results of this theory imply the use of 
stratified sampling when the conditions for its use obtain. 

Advantages of Singling O.ut Degrees of Freedom in Analyses of Variance, 

William Dowell Baten, Michigan Agriculture Experiment Station. 

This paper pertains to an experiment involving dummy plots for analyzing effects of 
placements and fertilizers for cannery peas. Three fertilizers were UBed" at different dis¬ 
tances from tho pea seeds at planting, the design being a randomized block layout. Ad¬ 
vantages are given for breaking up the sum of Bquares, due to differences between ‘'treat¬ 
ment" means, into sums of squares, each with one degree of freedom. Methods are given 
for securing the sum of squares involving dummy plots, and obtaining the variances due 
to main effects and interaction. Interpretations are given for each phase of the analysis. 




THE ANNALS 

of 

MATHEMATICAL 

STATISTICS 





(founded »t a. e. cahvbb) 

The Official Journal of the 

of Mathematical Statistics 


Contents 

The Progeny of an Entire Population. Alfred J. Lotka.. 118 

Asymptotically Shortest Confidence Intervals. Abraham Wald .., 127 
Grouping Methods. P. S. Dwyer. ...........................,« > 128 

t ( • ' ■ t i. 

On the Correct Use of Bayes’ Formula, R. V. Misbs. .. 186 

An Iterative Method of Adjusting Sample Frequency Tables when 
“ Marginal Totals Are Known. Frederick F, 

. ,* BxwrmN ...v.................................... 188 

“unatic 

Hased. ......... 

Tabulation of the Probabilities lor the Ratio of the Mean Square 
Successive Difference to the Variance. B. t. Hart and 
John von Neumann 

Cumulative Frequency Functions. Irving W. Burr. .218 

Notes: 

AnApproximate Normalisation of the Analysis of Variance Distribu* 

Hon, Edwabd Patoson....... — ..288 

Note on the Distribution of Roots of a Polynomial with Random Com* 1 ' . 

'plox 0oofficieRtSt 1V£<,A* Qjbsbiok.*♦■*«.»*♦ *•««*< **•<<♦ r> •*« * «8o 

A Not® on tlw Probability of Arbitrary Events. Hilda GarniNajBa. . 238 

. 'An Inequality for Mill’s Ratio. Z. W. BibnbaUm. . 


Vol. XIII, Mo. % ~r June, 1942 





THE ANN AES 

OF MATHEMATICAL STATISTICS 

kmtkh by 

a. & WILKS, Miter 

A. T. CRAIG J. NKYMAN 


If- a Cabvkk 

Ui CtUM&tt 

W. E. tteairwo 

Q. Darkois 


wm? Tin; rodrr.WArto** rip 
It A. Flatten 

T. a rnr 
If. I{<m i.ux(i 


H. vnsj Mi«» 
f>- K. Fr,AH«<,v 

W. I* Rim 
W. A, Shkwhaiit 


™, by •* 

Kwa * Ouilfori a™.. wum™. m“ V* Z ££££TT'i Mt ' 

tote of Mathematical Klatwlien, K, o. Ohb rJlIS wTtoT 5 !/! lmt, ‘ 
PiteburKlv Pa. Change* fo ^ S " '* ‘ 5 lm< ^ 

*'lrtw»feu® should be reported tolh® h ^^ wwwu «<*Miv« for 

monte preceding th « month of that i«m. T^monffe'IrT 
' September and December. row,|hi ‘ irf «» March, 

*t*wM be ^ A1Wr!aA ««A^ %*iwrrc« 

, ‘J®® 1 * be tyiwwrittoB doiaibte-srmccrl vrith whb mtJtt'n* Mju ‘«»«ripta 

should be nuhmiu C( l. fcmtm»tmjah»Mibi w * wwpiiei, end the oriigiaal copy 

fc^bfa replaced by a bMiogranhy at ti*™ i H tw,!f, * r,>M,l ‘ «*h! whenever 
»ek» abouM be avoided. Fbnu charts and M» fate* 

pl^n white paper ot tracing etoth fo M..J; *4mthl be drawn m 

/be printed. Authom am ^quested to kom m*!* **?J"** ***.**** th, T w to 
of complicated mathematical formula®. f ’ d *ypoers»r # b«e?d difficulties 

'*mmm I»fon55!^^ m * r * peitt$st wi, bout 


W J % V v A, <'1' > 

/ W 
\i f U 
"jf. 


U S.f“ P«»N«» *r TUB 

mvmivPKtm,is.„ 
to*ixmom, M»„ u. 8. A. 



«tt*ft-*0a»M Brfttobw, MflWfeid. tt0 ^ 


16 ® A* C «tf Morel* & mr 



THE PROGENY OF AN ENTIRE POPULATION 1 

By Alfred J. Lotka 
Metropolitan Life Insurance Company 

The literature on renewal theory has grown to considerable dimensions, 
until even admittedly incomplete bibliographies list over 100 titles, But a 
surprisingly small proportion of these publications exhibits any practical ap¬ 
plications to concrete data, and such applications as have been made (e.g by 
'Wicksell, Hadwiger, Rhodes) are for the most part of restricted scope. 

Anyone who has been following the development will, I think, feel that this is 
unfortunate. It, has a double disadvantage. On the one hand the purely 
theoretical discussions emphasize difficulties which in piactice may be relatively 
unimportant, being inherent either in some of the unrealistic ad hoc examples 
discussed, or ill the expressions used to fit smooth curves to the basic data, 
rather than in these data themselves. On the other hand some real difficulties 
in application to actual data seem to require further clarification. 

Several of the applications that have been made, including some of my own, 
are restricted to following up the "progeny” of a "population element” com¬ 
prising only individuals all originating at the same time and therefore all of the 
same age (in the. ease, of industrial equipment installation all made at one point 
of time). The analysis set forth in the treatment of this special case is competent 
also to deal with the practically more important case of the progeny of an initial 
population of given age distribution, though no example of this has hitherto 
been published. 8 Such an example will now be given, and at the same time this 
will afford an opportunity to clarify some points in the presentation of the more 
general case. 

Let Nt be the total number of females at time t, and c,(a) the number com¬ 
prised within the age limits a and a + da. Also, let m ( (a) be the age-specific 
fertility of females of age a, counting daughters only. If a and a are, respec¬ 
tively the lower and the upper limit of the female reproductive period, and B(t) 
the annual births of females, then 

(1) B(l) = J N,c,(a)m t (a ) da. 

However, it is not in this perfectly general form that the relation is to be ap¬ 
plied. The case to be considered is that in which the "initial” population is 
throughout its “future” development, subject to constant age-specific fertility 


1 Compare A, J. Lotka, “The progeny of a population element,” Am, Jour. Hygiene, 
Vol. 8 (1928), p. 875. 

J An example was given by the writer in an oral communication to the Eighth American 
Scientific Congress, May 1940, the Proceedings of which have not so far been published 

115 



110) 


AI.FHKt) 3. J.OTK V 


and mortality. If we denote the ''initial" time l»y l ~ co (wlneh we can do since 
the zero of time is arbitrary), we can then write 


( 2 ) 



N,c,(a)mja) da, 


t > w. 


Also, if p ( -o(a) is the probability for a female born at time r 1 — n of sur¬ 
viving to time t, being then a years old, we have 


(3) B(l ~ o)pi-.(a) « A r /C,(a), 

and, in particular, since in the case under consideration is constant, for 

t — a > «, i.e., for individuals born after t *« {*) 


(4) B(t — a)pja ) = A r iC,(«)i t > a + co. 

Now, we have been at liberty for the "future" values of and pi~«(a) 

to make the arbitrary assumption that they retain their values as of f = w and 
t — a > to, respectively. But for the "past" of the system under consideration 
we do not have equal liberty, for any assumption we. make must he compatible 
with 

(a) the initial age distribution 

(b) equation (1). 

We can, however, within these limitations, assume, that (4) still holds (or 
0 < t < a, thus 


(6) B{1 - a)p*(a) » A' ( c ( (a), 

Introducing this in (1) we have 


( 0 ) 



B(l — a)p u (a)ni,(a) da, 


( > 0. 


I > 0 . 


But we cannot now, further assume that 

(?) »ii(a) ** m u (a), i > 0, 

foi, in general, this would make (6) incompatible with (1). 

We can, however, split the integral in (6) into two parts, thus 

(8) B(t) - ^ B(t - a)pj_a)m,ia) da 4- J B(l - a)pja)>n„(a) da, 
with the assumption, only in the range a < t, 

(9) viiia) « a < U 

Denoting the first integral in (8) by F(t), and contracting p u (a)m u (a) to 
v u (a), we may write (8) in the form 

(19) Bit) = F(() 4- f Bit — a)</>uia) da, 

( u > = Fit) + m. 



PKCffiEXY OP A POPULATION 


117 


with 

( 12 ) 

and 


F(t) = 0 / > w 

Fit) - B{1) 0 <t < a 


(13) B{l) =■ J B(t — a)v> u (a ) da, i > w. 8 

The assumption (9) has a definite physical meaning. The integral in (6) 
has been so split that the first part, F(l), gives the births of daughters from 
mothers who themselves were born before i — 0, while the second part, 
gives the births of daughters from mothers born after l = 0. Equation (9) 
therefore expresses the assumption that for mothers born at or after i — 0, 
the age-specific fertilities for ages a < i have the same values m u (a), independent 
of t, as prevail for t = u. But at time t there are no mothers of age a > t, 
who were born after l ~ 0. Hence the assumption (9) can be quite simply 
stated to the effect that the age-specific fertilities m u (a) apply to all mothers 
born after time l — 0. This assumption cannot, in general be made for mothers 
born before L = 0, because it would not, in general, be compatible with the 
given initial age distribution and at the same time with assumption (5), Hence 
in the first integral of (8), denoted by F(l) in (10), we must write m t (a), not 
m u (a). 

Equation (10) is of the form discussed by G. Herglotz, 4 who writes its solution, 
for t > 0, in the form of an exponential series, 

(14) B{t) = 2 Q,e r < 1 

where the exponents rj are the roots of the characteristic equation, 

(15) 4>(r) = J e -ra ( p u (a) da = 1, 
while the coefficients Q j are given by 


Q, = — 

a 


f F(t)e- 
Jo 


_ ~ra 

ae ip Q 


(a) da 


There is only one real root of (14), since ip u {a) S: 0, for all values of a. For 
complex roots it is convenient to write the corresponding terms of the series (14) 
in trigonometric form 

(17) Qe r> = 2Ue u ‘ cos vt — 2Ve ut sin vt, 

(18) = 2\/ (U 2 + F 5 )e u ' cos (vt + 8), 


1 Since ip u (a) = 0 for a > to. 

1 Math. Annalen, Vol 65 (1908), pp 87 et seq. 



US 

VI.FIiUJ 3. I f ITKA 

whore 

tan o ■— r r, 

(10) 

r 

™“" " Vn ■- I' 3 ’ 

* " vn i r 3 ' 

and 


(20) 

HG + MI 

G 1 d IP ’ 

(21) 

- _ MI - S(r 

G- -f IP ’ 

in which 


(22) 

G = / ae cos m yU«1 da, 

•* a 

(23) 

II - f ar'"' sin va tfjci) tin, 

(24) 

R - f c coh vl F(() ill, 

Jo 

(25) 

S - O' sin rf F(0 dr, 

J 0 

For purposes of numerical application to the problem here cotif-idored, we must 
express the annual births 8(0 far l < w in terms of the Riven “initial” age 
distribution at time u. 

We have, generally 

(26) 

nfi \ AiCj(a) *V«r„(a d~ w t, 

B(l ~ a) « , . = , . .. , 

p,~ a [a) pja d - a ~ l) 

since individuals of age a at time t , are a -F w — t years old at time «. 
Introducing the relation (26) in (10) we have 

(27) 

•8(0 = F(0 + [ W r/’ p.»(«)m«.(a> da, 

p w (a d- w — l) 

and 


(28) 

F(t) « 8(0 - f N “ C f a , + w ” 0 y u (a) da, 

(29) 

2V u c u (aj - 0 WuCuta d- w - 0 , , 

“ J. P.(a + . - 0 v " (< ‘ ) 

(11a) 

= 8(0 - 0(0 



PROGENY OF .V POPULATION 


119 


Note that, in computing the integral p(t) for any particular value of t , the 
argument of the function c„ urns from a + co — t to to. Thus, for example, if 
the zero of time is 1865 and l = w is at 1920, then, in computing F( 35), i e , 
the value of F for 1900, the range of the argument of c m in the integral mill be 
from 10 + 55 — 35 to 55, i.e., from 30 to 55 

Numerical Example. By way of a numerical illustration these principles will 
now be applied to a concrete case. We shall start with the age distribution of 
the white female population of the United States as constituted in 1920, for 
which previous publications furnish some of the required data, including the i cal 
root and the first three pans of complex roots of the characteristic equation. 

From this “initial” age distribution in 1920 it is necessary first of all to com¬ 
pute the auxiliary function F(l) for the 55 years pnoi to 1920. The first term 
23(0 in the right hand member of (28) is very easily computed for successive 
values of t from the relation (5a), which simply expresses the fact that persons 
a years old in the year u, i.e., 1920, are the survivors of the 23 (w — a) peisons 
bom in the year u — a. 

(5a) N u c u (a) = B(co — a)p u (a). 

In the diagram Fig. 1, which is drawn in stereogiaphie piojection, the age 
distribution of the (white female) population of the United States in 1920 is 
represented as plotted in a plane reaching forward at right angles to the plane 
of the paper. Successive points of 23(f) for 0 < t < u>, have been computed 
“by survivals” according to (5a) and plotted as a curve in the plane of the 
paper “at the back” of the diagram The arrows indicate for a selected point, 
namely age 25 in 1920, the path of the computation according to equation (5a.) 

The second term /3(f) in the expression (11a) for F(t) was computed from the 
age distribution in 1920, the rates of survival from previous years into 1920, 6 
and the age-specific fertility at each age in the reproductive period, 10 to 55, 
on the basis of the relation (28). The results, for this second term in the ex¬ 
pression for F(l) computed for every fifth calendar year back of 1920 to 1875 
and inteipolated for intervening years, 0 were also plotted as a curve in the lear 
plane of the diagram. The shaded area in the curve for the age distribution in 
1920, and the arrows leading from this shaded area to the curve 

(10, 11) p(t) = [ B(t - a)<p u (a) da 

J (X 

/on \ f iWa -j“ 0) — t) / \ o 

(29, 11a) = —7—7 --- A - <Pu(a) da, 

« Pui (a -j- CO tj 

indicate in this case the path of the computation according to. equation (28). 

6 Using tlio Foiuluiy life table for white females in 1919-1920 In the first quinquennial 
age group, the following values were used. 

p( 0.5) = .9400 ?>(2.6) = ,9135 

p(l 5) = 9235. p(.3.S) = .9080 p(4 5) = .9040 

0 Tlua term vanishes for I < 10, i.e,, back of 1875. 



120 


ALFRED J. LOTK.V 


From these two curves, taking differences, the curve of F{t) - lift) — MO 
was plotted, as shown. 

With the values of F{1') thus obtained, we may proceed, by formulae (14) to 
(25), to compute values of B(l) for all values of l > 0. So far as the period 
1865 to 1920, corresponding to 0 < t < «, is concerned, this merely means that 
we have an analytical expression to fit what is essentially a fundamental datum 
of the problem. For values of l > « the formula gives us a continuation of the 
function B(t) for all future time so long as the given age-specific fertility and 
mortality holds. 



Fia. 1. Graph illustrating computation of auxiliary function F(l) from “initial 1 ’ ago 
distribution. 


The final results of this computation are exhibited in Figs, 2, 3 and 4. Of 
these, Fig. 2 exhibits the first, second and third oscillatory components for the 
period from 1890 forward. It will be seen that the waves are heavily damped, 
so that after a relatively short period the aperiodic component dominates the 
course of events. 

Fig. 3 exhibits, for the years from 1865 to 1920, i.e., for the period 0 < i < u, 
the aperiodic component (in a dashed line) and, as indicated by small circles, 
the sum of this component plus the three oscillatory components. It will be 
seen that from about 1890 forward the points so obtained follow rather closely 
the value B(l) derived by survivals from the age distribution in 1920. 




PROGENY OF A POPULATION 


121 


CAIINMR TEAR 

■Mm miwmoHcaunnmim 



Fid. 2 First three oscillatory components of total annual births 


CALENDAR YEAR 

1865 1875 1865 1895 1905 1915 1920 



® Sum of first Four components 

Flo. 3 Graph of functions B(t ), and F(0 for 0 < l < u, i.e., for 1865 to 1920, to 
gether with aperiodic component; also, summation of aperiodic and first three os 
cillatory components. 




122 


ALFRED J. LOTKA 


MILLIONS 



1890 ’00 'to '20 'JO Ho 50 60 ’TO 80 90 


CALENDAR YEAR 

Fio. 4. Sum of aperiodic and three oscillatory terms of series solution compared with 
results of “step by step" computation of annual births, 


TABLE I 

Constants of the Series Solution (II) of Integral Equation (10) to Third Oscillatory 
Component Inclusive t = 0 at 1865 


Func- 

Aperiodic Com- 



Oscillatory Components 

" 

- 

tion 

ponent 



—- 

— - -- 

-- 

--- . 





First 


Second 


Third 


u 

.543 X 10- 2 

- .380 

X 

10"i 

-8,731 

X 

10"* 

-9.801 

X 

10 3 

V 

0 

21.448 

X 

10-2 

31.542 

X 

10-2 

48.840 

X 

10 3 

G 

28,226 

25.768 



51.225 



37.008 


H 

0 

14,938 



-18.637 



17.206 



R 

23.262 X 10 9 

-17,863 

X 

10 5 

-37.196 

X 

10® 

11.684 

X 

10* 

s 

0 

-31.508 

X 

10 s 

16.827 

X 

10 6 

-16.543 

X 

10 s 

u 

82.416 X 10 4 

-10.494 

X 

10 4 

-74 679 

X 

10 4 

88.014 

X 

10 3 

V 

0 

61.442 

X 

10 3 

-56.787 

X 

10 3 

48.808 

X 

10 4 



PROGENY OF A POPULATION 


123 


Prior to about 1890, four components alone are quite inadequate, and the 
corresponding points have been omitted from the diagram The lack of con¬ 
cordance, with such limited components, is inconsequential in this part of the 
series, since the purpose of this part of the work was merely to compute the 
auxiliary function F(t) , and the fit obtained for B(t) in this range, so far as it goes, 
is merely a by-product, the main interest being in the course of B{t) for l > <o, 
i.e., in the years following 1920 

This course is charted in Fig 4, in which the points obtained by the series 
solution (14) of (10) are again shown as small circles, while the fully drawn curve 
is derived from my previous publication “The Progressive Adjustment of Age 
Distribution to Fecundity .” 1 The annual births in that case were obtained 
“step by step” by computing age distributions by survivals for successive 

TABLE II 


United States White Female Population 1920, Observed, Also, the Same Projected 

Forward for Later Years 8 


Year 

Population, 

thousands 

Births, 

thousands 

Birth rate per 1,000 
per annum 

1920 

49,390 

1,082 

23 32 

1930 

51,727 

1,162 

22.46 

1940 

56,910 

1,252 

22.00 

1950 

61,639 

1,307 

21.20 

1960 

65,835 

1,379 

20.95 

1970 

69,829 

1,465 

20 98 

1975 

71,828 

1,504 

20 94 

1980 

73,850 

1,543 

20.89 

1985 

75,902 

1,584 

20.87 


quinquennial periods, and applying to the reproductive age groups, in each 
case, the values of the reproductivity ?n„(o). 

It will be seen that the points obtained by the solution (14) follow very closely 
those computed “step by step,” although in the computation of the latter an 
approximation was made, using pivotal values of p w (a) for the several quin¬ 
quennial age groups. A slight error introduced in this way would tend to be 
cumulative, and perhaps accounts for the fact that towards the end of the 
period covered (1985), the two sets of values diverge slightly. Even so, in 1985, 
the divergence is only about .4 percent. 

The scries solution has, of course, the advantage that it gives directly the 
result for any particular point of time, wheieas the “step by step” method re^ 

7 Jour. Washington Acad Sci., Yol. 16 (1926), p. 605. 

8 Calculated step by step from survival ratios and age specific fertilities, both held 
constant as of 1920 (reproduced for ready reference from Jour. Wash, Acad. Sci,, Yol. 16, 
p. 605). 





124 


ALFRED J. LOTKA 


quires the computation of the annual births for all intervening points in order 
to obtain the result for the chosen point of time. 

Furthermore, the series tells us at once that the course of events is of the nature 
of a trend proceeding in geometric progression upon which are superposed a 
series of damped oscillations, of which the fundamental has a wave length equal 
approximately to the mean length of one generation from mother to daughter, 
i.e., about 28 years, 

Alternative procedure. The procedure set forth in the preceding sections in¬ 
volves not only arbitrary assumptions regarding the values of p(a) and m(a) 
for “future” time, which are fundamental to the problem under consideration, 
but involves further incidental assumptions regarding their values prior to the 
“initial” condition at the instant denoted by t ~ u>. These incidental assump¬ 
tions are in a sense superfluous, since the future history of the system is com¬ 
pletely determined by the initial age distribution and the assumed “future” 
values of p(a) and m(a). The additional assumptions were introduced merely 
for the purpose of translating the initial age distribution into a series of values of 
B(t) for 0 < t < vi, i.e., prior to the given initial age distribution. 

In actual fact the age distribution at time i! — u> did not arise in the manner 
assumed; actually both p(a) and m(a) undoubtedly varied in the period 1865 
to 1920, and migration also affected the situation. The quantity Fit) intro¬ 
duced in equation (10) is, in fact, a purely auxiliary function having no direct 
relation to the biological events at time t < u, 

An alternative procedure which would avoid these conflicts, and introduce 
assumptions only regarding “future” values of p(a) and m(a ), would be to 
compute B(t) step by step over the period from B(1920) to B(1920 + w) — 
B(1975), 

Placing the zero of time l = 1920 this would give B(f) for 0 < t < w. For 
l > oi we should have, simply 

B(t) = J B(i — a)<pim(a) da, t > w, 

using, in the evaluation of the integral, the values of B(l — a) obtained by the 
step by step process. 

We could here also split the integral into two parts 

B(<) = J' Bit — a)^i9jo(a) da + J B(t — aVi«o(a) da 
= F(i) + f Bit, — a)y;i#jc(a) da. 

" tt 

But the function ip im o(a) is now the same in the two integrals, and there is no 
occasion, in this case, for distinguishing the two parts of the integral. 

If this procedure is adopted, its application to the course of B(l) for t > w, 



PROGENY OP A. POPULATION 


125 


i,e. beyond 1975, is of minor interest, for by that time it has practically settled 
down to the aperiodic (exponential) component, the oscillations being greatly 
damped down. The major interest in the result of a computation carried out 
by this procedure would be in the fitting of a series of the form (14) to the 
function B{t ) in the range 1920 to 1975, which, in this setting, figures as a known 
“arbitrary” function. 

Of the two alternative procedures the one carried out in detail in the text 
and the numerical example is of greater interest, as exhibiting in greater gene¬ 
rality the application of the Hertz-Herglotz solution. 

BIBLIOGRAPHY 

A few more titles may be added to previously published bibliographies (see thesa Annals, 
Vol. 10 (1939), p. 22; Vol. 12 (1941), p 266). 

1936 W. MOschxer, “Untersuchungen tlber Eintrittsgewinn und Fehlbetrng einer Ver- 
sicherungskasse,” Bull, de l’Association des Actuaires Suisse s, October (1935), 
p. 129 

1936 Ali Afzalipour, Contribution i l'etude de la theoric malhimatique de la demographie, 
Thfese, 1936 

1938 H. Hadwiger, “Ein Konvergenzkritenum ftlr Erneuerungazahlen,” Skandmavisk 
Aktuanetidskrift, Vol. 3-4, p. 226. 

H. Hess, Anviendung der logistischen Funktion in der malhematischen BevOlkerunga- 
thcorie, Inaugural Dissertation der philoaophischen Fakult&t der Universitat 
Bern. 

W. M, Dawson and W Edwards Deming, “On the problem of natural increase,’’ 
Growth, Vol. 2, p. 319. 

1940 A. W. Brown, “A note on the use of a Pearson Type III function in renewal theory,” 
Annals of Math Stat., Vol. 11 (1940), p. 448. 

1940 E. Zwinogi, “Entwicklung von Personengesamtheiten—Zusammcnfasscnder Be- 
richt,” Twelfth International Congress of Actuaries, Vol. 3, p. 263 
1940 E. Keinanen, “Uber die Altersverteilung der Bevblkerunp,” Twelfth International 
Congress of Actuaries, Vol. 3, p. 305. 

1940 S. S. Townsend, "Some observations on the internal variation in groups of lives 
assured in industrial assurance and in the general population of England and 
Wales,” Twelfth International Congress of Actuaries, Vol 3, p. 319. 

1940 R Tarjan, "Untersuchungen (lber den Kapitalbedarf des Lebensversicherungs- 
geschhftes,” Twelfth International Congress of Actuaries, Vol. 3, p 335. 

1940 M. Presbukger, "Sur l’dtude g4n4rale des collectivity de personnes,” Twelfth 
International Cpngress of Actuaries, Vol, 3, p, 353. 

1940 A Maret, “Direkte Berechnung der Vorgangsfunktionen einer offenen Gesamtheit,” 
Twelfth International Congress of Actuaries, Vol, 3, p. 387. 

1940 E. Zwinocii, “tlbor Zusammenhknge zwiBchen der techniBchen Stabilitftt einer 
Sozialversichorungskasse und der Entwicklungsformel ftlr den Versichorten- 
bestand," twelfth International Congress of Actuaries, Vol. 3, p. 395. 

1940 H. Hadwioer and W. WeomOleer, “Entwicklung und Umsiohichtung von Pereonen- 
gesamtheiten,” Twelfth International Congress of Actuaries, p. 369. 

1940 W. Dobbernack and G. Tietz, “Die Entwicklung von Personengesamtheiten vom 

Standpunkt der Sozialersicherungstechnik,” “Zwdlfter Internationaler Kongrcss 
Der Versiclverungs-Mathematiker,” Vol. 4, p. 233. 

1941 R C. Geary, "Irish, population prospects considered from the viewpoint of reproduc¬ 

tion rates,” Statistical and Social Inquiry Society of Ireland, 1941. 



120 


ALFRED J. EOTKA 


1941 H.HaDWIGer, "Kim* Formal der mat hemal ischeii Hrvolheruiigatheorii'," Miltolungm 
(ter Vcrcimgunq Schv'tnzcruichcr Vermchmingsmathrmolikir, Yu], il, p. G7 

L. Fehaud, "lit; renouvellcment, (jurlqiiPH prnMemeR connrvH el les PtpiHtiniif. 
integrates do cycle feraif*," Mittnlungai (Ur Vtrmtngung Srhwasrrirchcr Vcr- 
sichermigsmathcmattker, Vol. 41, p. fil 

W. Feller, "On the integral equation of the renewal] theory," Annate of Math. 
Slat., Vol. 12 (1911), p. 2-19. 

IIarro BerkaRdelu, ‘‘Population wave*," Journal of the Hunan firmireh Sari rig, 
Vol. 31. 



ASYMPTOTICALLY SHORTEST CONFIDENCE INTERVALS 1 

By Abraham Wald 2 
Columbia University 

The theory of confidence intervals, based on the classical theory of proba¬ 
bility, has been treated by J. Neyman. 3 While Neyman considers the case of 
smajl samples, we shall deal here with the limit properties of the confidence 
intervals if the number of observations approaches infinity. 

1. Definitions. We will start with some of Neyman’s definitions. Let 
f(x, 6) be the probability density function of a variate x involving an unknown 
parameter 9. Denote by E n a point of the n-dimensional sample Bpace of n 
independent observations on x. If p(E n ) denotes for each E„ a subset of the 
real axis, the symbol P[p(E n )cd' \ 6"] will denote the probability that p(E n ) con¬ 
tains 6' under the hypothesis that 6" is the true value of the parameter. Let 
9{E n ) and 6(E n ) be two real functions defined over the whole sample space such 
that 0(E n ) < 9(E n ). The interval 5 (E n ) = [9(E n , 9(E n )] is called a confidence 
interval of d corresponding to the confidence coefficient a (0 < a < 1) if 
P[5(.E„)c0 | 0] = a for all values of 9. 

The interval function 5(E n ) is called a shortest confidence interval of d corre¬ 
sponding to the confidence coefficient a if 

(a) P[S(E n )c9 | 9] = a for all values of 6, and 

(b) for any interval function &'(E n ) which satisfies the condition (a) we have 

P[5(E„)c0' ] 6"] < P[d'{E n )c9' | 9"}, 
for arbitrary values 6' and 6". 

The interval function S(E n ) is called a shortest unbiased confidence interval 
of 6 if the following three conditions are fulfilled: 

(a) P[6{E„)cd | 0] = a for all values of 6. 

(b) P[S(E n )c9' | 9"] < a for all values of 9' and 9" , 

(c) For any interval function 5'(E n ) for which the conditions (a) and (b) are 
satisfied, we have 

P[6(E n )cd' | 0"] < P{6'(E n )ce' | 0"], 
for all values of 9' and 9". 

For any relation R we shall denote by P(R | 6) the probabihty that R holds 
under the hypothesis that 9 is the true value of the parameter. Similarly for 

1 Presented at a joint (neoting of the Institute of Mathematical Statistics and the Ameri¬ 
can. Mathematical Society in Hanover, September, 1040. 

* Research under a grant-in-aid from the Carnegie Corporation of New York. 

1 J, Nkyman, “Outline of a theory of statistical estimation based on the classical theory 
of probability,” Phil. Trans. Roy Soc. London, Vol. 236 (1937), pp. 333-380. 

127 



128 


ABRAHAM WALD 


any region Q„ of the n-dimensional sample space the symbol P(Q n | 0) will denote 
the probability that the sample point E„ falls in Q n under the hypothesis that 6 
is the true value of the parameter. 

In all that follows we shall denote a region of the n-dimenmonal sample space 
by a capital letter with the subscript n. 

A real function 6(E n ) is called a best upper estimate of 8 if the following two 
conditions are fulfilled: 

(a) P[9 < S(E n ) | 0 ] = a for all values of d. 

(b) For any function 8'(E„) which satisfies the condition (a) we have 

P[8' < 8(En.) | 8"] < PW < 8'(E n ) | 8 "] 

for all values 8' and 8" for which 8' > 6". 

A real function B(E n ) is called a best lower estimate of 8 if the following two 
conditions are fulfilled: 

(a) P(0 > 0(2?„) | 6] — a for all values of 6. 

(b) For any function 8'{E n ) which satisfies the condition (a) we have 

P[8‘ > Q(E n ) | *"] < P[8' > §'(E „) | 6"] 

for all values of 8' and 8" for which 0' < 9", 

We will extend the above definitions of Neyman to the limit case when n 
approaches infinity. 

Definition I: A sequence of interval functions {5 n (E m ) | (n ~ 1, 2, • • • ) is 
called an asymptotically shortest confidence interval of 8 if the following Cm conditions 
are fulfilled : 

(a) P[8 n (E„)c8 | 0] = a for alt values of 6, 

(b) For any sequence of interval functions {S r „(E „)) (n => 1 ( 2, •••, ad inf,) 
which satisfies (a), the least upper bound of 

P[S n (E n )c8 f | 9 "] - Pih'McE | 9"] 
with respect to 9' and 9" converges to zero as n —► °°. 

Definition II: A sequence of interval functions {{„(£„)) is called an asymp¬ 
totically shortest unbiased confidence interval of 0 if the following three conditions 
are fulfilled ; 

(a) P[iJE„)c8 | 0] = a for all values of 0, 

(h) The least upper bound of P[8 n (E n )cd' | 0"] with respect to 8' and 8" converges 
to a with n —► oo , 

(c) For any sequence of interval functions [■$'„(#..)) which satisfies the conditions 
(a) and ( b ), the least upper bound of 

P[8 n {E n )cd' | 0"] - PtfMcd 1 | 0"] 

with respect to 0' and 8" converges to zero with n —* a> , 

Definition III: A sequence of real functions \8 n (E n ) | (n <=> 1 , 2, • > • , ad inf.) 
is called an asymptotically best upper estimate of 8 if the following two conditions 
are fulfilled: 

(a) P[0 < 8 n (E„) 1 0] = a for all values of 8. 



CONFIDENCE INTERVALS 


129 


(b) For any sequence of functions [8' n (E „)} which satisfies (a) the least upper 
bound of 

P[6' < 8 n (E n ) | e"] - P[8' < K(E n ) | o"] 

in the domain 8' > 8" converges to zero with n —> «. 

Definition IV: A sequence of real functions { 0 n (S n )} is called an asympto¬ 
tically best lower estimate of 6 if the following two conditions are fulfilled : 

(а) P[0 > 8„(E n ) | 6} = a for all values of 8. 

( б ) For any sequence of functions [§' n (E n )} which satisfies (a) the least upper 
bound of 

P[8' > .W | 6") - P[ 0 ' > d’ n (E n ) | 8"] 
in the domain 8' < 8" 'converges to zero with n —*■ °o. 

2. Two Propositions. Proposition I: Let { WJfi)} (n = 1, 2, ■ • ■ , ad inf) 
be for each 8 a sequence of regions such that the following two conditions are fulfilled: 

(a) P[PF„( 0 ) | 6] = 1 — a for all values of 8. 

(b) For any sequence of regions {Z n (8 )) which satisfies (a) the least upper bound of 

P[Z„(6') | 0") - P[TV„( 0 ') | 8 "] 

m the domain 8' > 8" [8' < 8") converges to zero with n —> ■». 

Denote by p n (E n ) the set of all values of 8 for which E n does not lie m W n {8). Then 
we have 

(c) P[p„(P n )c 0 | 8] — a for all values of 8. 

(d) For any sequence of selfunctions [p n {E n )\ which satisfies (c), the least upper 
bound of 

P[p n (E n )c8' \ 8"] - P[p' n {E n )cO' | 8"] 

in the domain 8' > 6"(6' < 6") converges to zero with 
Proposition II: Let {TV n (0)i be for each 8 a sequence of regions such that the 
following three conditions are fulfilled : 

(a) P(TVn( 0 ) | 0] = 1 — a for all values of 8. 

( b ) The greatest lower bound of P{W n {8') j 8"] converges to 1 — a with n —>• °o. 

(c) For any sequence [ W n (6 )} which satisfies (a) and (b), the least upper bound of 

P[W' n {8') | 8"] - PlJV„(0') | 8"] 

with respect to 8' and 8" converges to 0 with n —► co, 

Denote by p„(E n ) the set of all values of 8 for which E n does not lie in I7 n (0) ; Then 
we have 

(d) P[p n {E n )c8 j 0 ] — a for all values of 8, 

(e) The least upper bound of P[p„(P„)c 0 ' | 0"] converges to a with n —► co. 

(/) For any sequence of setfunclions (pi(.£„)} which satisfies (d) and (e), the least 
upper bound of 

P\p n {E n )c8' I 0 "] - P\p'n(E n )cd' I 0 ") 
with respect to 0 ' and 8" converges to 0 with n —* «>, 



130 


ABRAHAM WAMl 


The validity of the above propositions follows easily from the identity 
P[ Pn (E n )ce' | e"\ = i - P[WJLtP) S e"l 


3. Assumptions on. the probability density function. For any function 
4f(x) denote by AVKx) the expected value of *p(x) under the assumption that ft 
is the true value of the parameter, i.e. 


E,4>{x) = [ 

J-~oO 


t(x)f(x, 6) dx, 


For any x, for any positive 8, and for any real value Q‘ denote by <pt(x, O’, 5) the 

greatest lower bound, and by <&(x, 8', 5) the least upper bound of log fix, d) 

in the interval 6' — 8 < 6 < 8' + 5, 

Throughout this paper the following assumptions on fix, 6) mil be made: 

Assumption I: The expectation Ey ~-log/(x, 0 ") is a continuous function of 

v0 

& and 6", and for any pair of sequences | o ' n } and { O',' j («. =» 1, 2, - • • , ad inf) 
for which 

lim Ey K ~ log f(x, d'n) =* 0 

ft*» Ou 

also 

lim ( 01 , — 0 » 0 . 

n »-50 

Furthermore 

Ey[^\ogf(x, 0 ")J 

is a hounded function of 0' and B", and >Et 
lower hound. 

Assumption II: There exists a positive value fco such that the expectations 
EytpiCx, 0", 8) and Ey<pt(x, 6", 5) are uniformly continuous function# of O’, 6" and 
S where 5 takes only values for which | $ j < fco. Furthermore it is assumed that 
Ey[<pf(x , 0 ", 5)] s (i = 1 , 2 ) are bounded functions of 8 J , 0 " andH (( 5 j < fco). 
Assumption III: The relations 

. de f(x ' e)dx=s L w fix > 0) dx " 0 

hold. 

The above assumption means simply that we may differentiate with respect 
to 8 under the integral sign In fact 




^log/fes) 


d(0) ha# a positive 



CONFIDENCE INTERVALS 


131 


identically in 0 . Hence 

feL #) di - h £ ^ 9> 

Differentiating under the integral sign, we obtain the relations in Assumption III. 
Assumption IV: There exists a positive, rj such that 

/&,«)] ' 

is a hounded function of 0 . 

4. Some theorems. The Assumptions on f{x, 0 ) made in this paper become 
identical with the assumptions I-IV formulated in a previous paper 4 if a certain 
set to involved in those assumptions is put equal to the whole real axis 
(— 00 , + a>) Hence we can make use of all results obtained in that paper 
putting w = (— 00 , 4 - 00 ). Among others, the following statements have been 
proved there. 

» i g 

(A) Denote ^ 7/~ ^ log f( x « > d ) by y n (&, Ef) and let R n (9) be the region 

defined by the inequality y n (6, E n ) > A n {6) where A„(9) is chosen such 
that P[R n (Q) | 0 ] = 1 — a. Then for any sequence of regions {Z n (6)} for 
which P[Z n (0) [ 0] = 1 — a, the least upper bound of 

P[Z n (8') | 6"] - P[R n (6’) | 6"] 

in the set 0" > O' converges to 0 with n -mo. 

(B) Let <S„( 0 ) be the region defined by the inequality y n (B, E„) < B n {0) where 
B n (d) is defined such that P[S„( 0 ) | 0} = 1 — a. Then for any sequence of 
regions {Z„( 0 )} for which P[Z n (6) | 6] = 1 — a, the least upper bound of 

P[z n (0 o 1 6"] - P[S„(e') I e"} 

in the set 6" < 6' converges to 0 with n —> «>. 

(C) Denote by T„(0) the region defined by | y n {0, E n ) [ > C n (6) where C n (0) 
is chosen such that 

(a) P[T n (B) | 6] = 1 - a. 

Then T n (6 ) satisfies also the following two conditions: 

(b) The greatest lower bound of P[T n (0')O "] converges to 1 — a with 
n —► ai. 

(c) For any sequence of regions (Z n (0)} which satisfies (a) and (b), the 
least upper bound of 

P[Z n (d') | 6 "] - P[T n (9') | 0 "] 

converges to 0 with n —► , 

* A.. Wald, "Some examples ,of asymptotically most powerful tests,” Annals of Math. 
Statr, Vol 12 (1941), pp. 396-408. 



132 


ABRAHAM WARD 


On account of Propositions I and II we easily get the following theorems: 
Theorem I: Denote by £„(!?„) the set of all values of 8 for which y n {8, E„) < 
A„( 0 ) and A n (0) is defined such that P[y n (8, E n ) > A n (8) j £>] — 1 — a. Then 
£„(£?„) satisfies the following two conditions: 

(a) P[$ n (Ef)c8 | 8] = a for all values of 8. 

(b) For any sequence of setfunctions ({(,(.??„) j which satisfies the condition (a), 
the least upper bound of 

P[UE n )c8 'j e"] - Pl&mctf ! 0") 

in the set 8 IJ > 6 1 converges to 0 with n —* « 2 . 

Theorem II: Denote by Hn(E n ) the set of all values of d for which )/„((?, E„) > 
B„(B ) and B„(8) is defined such that P[y n {8, E„) < B n (8) | 8] ™ 1 — a. Then 
t n (En) satisfies the following two conditions : 

(a) P[£„(E n )c8 | e] = a for all values of 8. 

(i b ) For any sequence of setfunctions (fl(/?„)} which satisfies the condition (a), 
the least upper bound of 

P[{ n (E n )c8' | 8"] - P[^ n (E„)c8' j 0"\ 

in the set 6" < 6‘ converges to 0 with n ~+ «j . 

Theorem III: Denote by p n (E n ) the set of all values of 6 for which j yJB, E n ) j < 
C„(0) and C n (8) is chosen such that P(| y n (0, E n ) | > C n {0) | 0 ] =» 1 — a. Then 
p n (E n ) satisfies the following three conditions : 

(a) P[p n (E„)c8 | 8] = a for all values of $. 

(b) The least upper bound of P\p n (Ef)c9' | 8") converges to a with n m . 

(c) For any sequence of setfunctions |pj,(2? n )} which satisfies the conditions (a) 
and (b), the least upper bound of 

P[p n (E n )c8' | 0 "] - P[p n {E„)c9 f | 8") 
converges to zero with n —» <». 

Now we shall investigate the question whether the sets £«(£*), t„(E n ) and 
pn{E n ) are intervals. For this purpose we will prove some propositions. 

Proposition III: Let t and D be two positive numbers such that t < D. Denote 
by Qn(8, t, D ) the region which consists of all points E„ for which 

Vn(8 + E „) < — n\ and y„{8 — E „) > n‘ 

for all values d in the interval [t, D], Then we have 

(1) lim P[Qn{8, t, D) | 0] « 1 

uniformly in 9. 

Proof: Let «i, ea, • • • , * r be a sequence of points in the interval (#, P] such 
that «i — « = ta — ej = ■ ■ • = £ r — e r -i = P — e r = (say), where r is chosen 
sufficiently large Buch that Assumption II holds for | & | < k 0 , Denote by 
R*(8, «0 the region in which 

( 2 ) 


y»(e -f *, , E„) < -nK 



We will show that 
(3) 


CONFIDENCE INTERVALS 


133 


lim P[R„{9, e.) | B] — 1 

n ms 

uniformly in 6, 

From Assumption I it follows that the greatest lower bound of 

E, ~ log f(x, 6 + «') 

with regard to «' in the interval [«, D] is positive. Let this greatest lower bound 

be A > 0. Since on account of Assumption I Ei — log f(x, 6 + «') is a continu- 

op 

ous function of it does not change sign in the interval e < t' < D. Since 

r 3 ~|2 

this is true for arbitrarily small t and since Et — log f(x,8) — ~E t — log 

|_O0 J OtT 

f(x, 9) has a positive lower bound (Assumption I), it follows easily on account of 
Assumption II that , 


Ei — log f(x, 8 + t') < 0. 

dp 


Hence 


(4) E, log f{x, 9 + «') < —A <0 for e < e' < D, 

OP 

and therefore 

(6) Ety n (9 + t', E n ) < —A\/n for e < e' < D. 

From Assumption II it follows that the variance of y n (9 + t, E„) is a bounded 
function of 0 and Hence 

(6) lim P[y n (.d + <;, E n ) < — \A\fn 1 6] — 1 


uniformly in 9. The equation (3) is a consequence of (6). 
Denote by S n (9, e.) the region in which 


— ) j 9 t fa) 

n „ 


< C 


a = i, 2 ) 


where C is greater than the least upper bound of | E,<pi(x, 9’, fa) | with respect 
to 8 and 8 \ Then we have on account of Assumption II: 


(7) lim P[S n (e, *.) | 9] = 1 (i = 1, 2, • • •, r) 


uniformly in 9. In the region S„(8, «,•) we obviously have 
(8) y n (9 + cj, E„) < y n {8 + «» , E n ) + 2fa\/nC 



134 


ABRAHAM WALD 


for all values e\ in the interval [e,- — k 0 , e. + fo>]. By choosing r sufficiently 
large we can always achieve that 

2kcC < A 
4 

Denote by T n {8, tf) the region in which 

(9) y n (6 + t'i , E n ) < — A Vn for « — ks < t[ < tt + h. 

From (6), (7) and (8) we get 

(10) lim P[T n (0, ft ) |ff] - 1 

uniformly in 8. Lot Q' n (8, e, D) be the common part of the r regions 
T„{8, ei), • ■ ■ , T n {9, < r ), i.e. Qn(8, t, D) is the set of all points E n for which 

y„(6 + E n ) < Vn 

for all t’ in the interval [«, D]. Since r is a fixed positive integer not depending 
on n, we get from (10) 

(11) Urn PM'.(fl,«,2»|ff] - 1 

n—ao 

uniformly in 6. 

In the same way we can prove that 

(12) lim P[Q"(9, «, 2» | ff] « 1 

uniformly in 8, where Q"(B, t, D) denotes the region in which 

y n (8 — E n ) > ~ Vn for all «' in [e, D], 

Proposition III follows from (11) and (12). 

Proposition IV: Denote by V n (8, «) the region in which 

^y»(8', E n ) < -» 4 

for all values 8' in the interval [8 - t, 0 + e]. There exists a positive e such that 

lim P[V n (0, t) | ff] - 1 

uniformly in 8. 

Proof: Since the least upper bound of E m (x, 8, 0) is <0, we get from 
Assumption II that the least upper bound of E&p?(x, $, i) is <0 for sufficiently 



CONFIDENCE INTERVALS 


135 


small t > 0. Denote tlie least upper bound of Ewfa, e) by —B and let the 
region in which 

1 X-i 

- 2_/ <pi( x a , 3, e) < —%B 

n a 

be denoted by W n {Q, t). From Assumption II it follows that 

lim P[W n (9, c) | 6] = 1 

uniformly in 6. Since for almost all n W n ( 0 , e) is a subset of V n (9, t), Proposi¬ 
tion IV is proved. 

Proposition V: Let A n (9), B n ($), C„(8) be the functions as defined in Theorems 
I—III. There exists a finite value G such that 

| A n (6) | < G, | B n {6) ! < G and | C n {9) | < G 

for all 8 and all n. 

Proposition V follows easily from the fact that the variance of y„(8, E n ) is a 
bounded function of n and 8. 

Let D be an arbitrary positive number and denote by W n {8, D) the region con¬ 
sisting of all points E n for which the following conditions are fulfilled: 

(a) The equation y n {8', E n ) = A n (8') has exactly one root in 6’ which lies in the 
interval [8 — JD, 6 + D]. 

(b) The equation y n (9', E n ) = B n {6') has exactly one root in 8' which lies in 
the interval [8 — D, 6 + D]. 

(c) The equation y n (8\ E n ) = C n (8 r ) has exactly one root in 9' which lies in 
the interval [9 — D, 8 + D], 

(d) The equation y n (8', E n ) = —C n {8') has exactly one root in 6' which lies in 
the interval \6 — D, 8 + D]. 

(e) The common part of [8 — D, 8 + D] and £„(!?„) is the interval [d' n (E n ), D) 
where 6 n {E n ') denotes the root of the equation in (a). 

(f) The common part of f„(£?„) and [8 — D, 8 + .D] is the interval [ -D, 8n{E„ )] 
where 6„(E n ) denotes the root of the equation in (b). 

(g) The common part of p n (E n ) and [9 ~ D, 9 + D] is the interval 
[ 8 n (E n ), 8 n {E n )] where 8 n {E n ) denotes the root of the equation in (c) and 
8 ,,(£/„) denotes the root of the equation in (d). 

From Propositions III-V follows easily the following 
Proposition VI. For any 'positive value D 

lim P[W n (8, £>) | 0 ] = 1 , 

11 —go 

uniformly in 8, provided that the functions A„( 0 ), B n {8) and C n {8) arc continuous 
and of bounded variation in any finite interval. 

We will show that Proposition VI remains valid for D = + °o, if we make the 
following 



136 


ABRAHAM WALD 


5 

Assumption V: Denote by <p(.x, 8, D) the hast upper bound of - log f(x, O’) 

with respect to O' where O' >0 + D. Denote furthermore by \fr*(x, 6, D) (he greatest 

lower bound of - tog f(x,6') with reaped to O’ where O’ < 8 ~ D. There exists a 
dO 

positive D such that the least upper bound of E^(x, 8, D) with respect to 8 is negative, 
the greatest lower bound of Etf*(x, 6, D) with reaped to 8 is positive, and the variances 
of >p(x, 0, D) and \p*(x, d, D) are bounded functions of 8. ( The. variance* are 
calculated under the assumption that 8 is the true value of the parameter,) 

It follows easily from Assumption V that 

lim P 23 f(*<• i D) < —n* [ 8 J 

= lim P 23 ,9,D)> n 1 i *» 1 

uniformly in. 8. 

Since 

4= £ *(*. , t>, D) > ytf, 1 50 for 8' > 8 + D 

■yn « 

and 

for 8‘ <, 0 ~~ D, 

Proposition VI remains valid if we substitute 4*« for D. 

Hence we obtain the following 

Corollary: If the assumptions I-V are fulfilled and if A «(<?), B K (8) and C n (8) 
are continuous and of bounded variation in any finite interval, then 

(a) The root 8'„{E„) of the equation y n (8, E n ) ~ A*(6) in 9 is an asymptotically 
best lower estimate of 8, 

(b) The root e"(E n ) of the equation y„(8, E n ) = B„(6) in 9 is an asymptotically 
best upper estimate of 8. 

(c) The interval [8„(E„), 8„(E n ) is an asymptotically shortest unbiased confidence 
interval of 8 , where d n (E n ) denotes the root of the equation y n (8, E„) « +C n (8), 
and 8„(E„) denotes the root of the equation y„(8, Ef) » ~C n (8). 

6. Some Remarks. 1 . I should like to make a few remarks about the relation* 
ship of these results to those obtained by S. S. Wilks. 1 The definition of a 
shortest confidence interval underlying Wilks’ investigations is somewhat differ¬ 
ent from that of Neyman’s which has been used in this paper. According to 
Wilks, a confidence interval i(U„) is called shortest in the average if the expected 

* S. S Wilks, "Shortest average confidence intervals from large samples,' 1 Annals of 
Malh. Slat., Vol, 9 (1938), pp. 166-175. 



CONFIDENCE INTERVALS 


137 


value of the length of 5(E n ) is a minimum. The main result obtained by Wilks 
can be formulated as follows: The confidence interval [0 n (,E,,), S n {E n )} given in 
our Corollary is asymptotically shortest in the average compared with all confi¬ 
dence intervals computed on the basis of functions belonging to a certain 
class C. In the present paper no restriction to a certain class of functions has 
been made. 

2. If the parameter space is not the whole real axis, but an open subset of 
it, and if the assumptions I-V are fulfilled when 6 can take only values in tt, 
the previously proved Corollary remains valid. If Q is a bounded set, Assump¬ 
tion V is a consequence of Assumptions I-IV. 



GROUPING METHODS 
Hy P. S. Dm yi:u 
I 'nurnih/ of Mtrhujnu 

1. Introduction. Tim conventional formulas fur moment adjustment- known 
as Sheppard's corrections sue not too satisfactory foi piactical ns*. As ('an cr 
has pointed out [lj Sheppard's eurrertions air meiely .-\stvmatir, adjustments 
w’hieh eliminate the bins introduced by giouping. The value- of tin- moment - 
after Sheppard's corrections have, been applied may lie looked upon as unbueed 
grouping estimates of the true moments while the tineorirated values constitute 
I based estimates 

In practice one obtains his moments from a single grouping. The application 
of Sheppard’s adjustments in such a rase does not necessarily result in the un¬ 
biased estimate being closer to the true moment than is the hta-i*d estimate mid, 
in an appicoiable percent age, of cases, the unbiased estimate 1c further from the 
true moment than is the biased estimate. One does not know when he applies 
Sheppard's adjustments to the. results of a single grouping; uhethei m nut lie i- 
making a correction in the right direction. 

This situation is not too satisfactory and yet practical necessity demand- 
some method of grouping The improvement of modern calculating machine- 
tends to push grouping techniques further into the background since, in mum* 
cases, the machines permit the determination of the actual value- of the moments 
without grouping in a reasonable amount of lime. Hut even here it i- pos-ible 
to use grouping methods and to get a good estimate of the tine value in a fine 
tion of the time. It is the purpose of this paper to present some new grouping 
methods which arc useful in obtaining much better unbiiv-ed estimates fiom a 
single grouping than can be obtained with the use of SlieppatdV cm led mil". 
These methods demand additional work but this- additional work is justified bv 
the additional precision resulting when such precision is desired. 

flic spirit of the new approach, which in one sense is a generalization of the 
earlier appioach, can he expressed very simply though the details of (he de\ elnp- 
mont and the calculational methods demand amplification. If we let .r - tin* 
true value and j' the grouped value (the value of the class murk of the gioup in 
which x is), and the error, e - the difference between the true value ami the 
grouped value, then 

(1) « = a — x', x = x' 4- e, and x’ x ~ «. 

In the classical theory we use Ex'" as the biased grouping estimate, of Ex'. In 
the new methods wo use Exx ns the. biased estimate or, if we desire more pre¬ 
cision, 2xV- J as the biased estimate. It is then possible to correct Exx" 1 for 
grouping bias and to correct Exx' ' for grouping bias just as we now correct 
2 - |,M for grouping bias. It is also possible to use the values of Ex" and E.rx" 1 

138 



GROUPING METHODS 


139 


in obtaining a better unbiased estimate or to use the values of 2x", Exx' 1-1 , 
and 2xV*~ 2 in obtaining a still better unbiased estimate of 2x*. 

2. Illustration. The relative merits of the conventional method and the 
proposed methods can be shown effectively by means of an illustration. For 
this purpose I have selected the problem used previously [1, 154] in showing the 
variations in grouped results. The power sums rather than the moments are 
used and the origin is taken at a point near the mean so that the relative varia¬ 
tions are as large as possible. If the values of the power sums weie "padded” 
by measurement about zero, the relative variations would not appear as large. 
However, a problem which shows considerable variation, and in the problem 
under consideration the nine 2 x' s unbiased grouping estimates of 2 x 3 resulting 
from the nine groupings do not even have the same sign, is an appropriate one 
with which to demonstrate the improvements introduced by the new methods. 

The pioblem consists of 244 discrete variates which range in value from 64 
to 155 Carver took a class interval of nine and formed the nine frequency 
distributions which result when class intervals of nine are chosen in all possible 
ways. He computed the values of 2 x', Ex' 2 , 2 x' 3 , Sr' 4 , for each of the nine 
distributions, corrected each for bias with the use of the Sheppard adjustments, 
and showed that the averages of the nine connected estimates arc respectively 
the values of 2 x, 2 x 2 , 2 x 3 , 2 x\ 

In Table I are presented the values of the biased and unbiased grouping 
estimates of 2x, 2x 2 , 2x 3 , 2 x 4 , which result from the use of ( 1 ) 2x'"; ( 2 ) 2xx" _1 ; 
(3) 2xV' -2 ; (4) 2x'“, 2xx /J_1 ; and (5) 2x", Zxx'"" 1 , SxV* -2 . The results are 
presented here for comparison only; the details of the computation aie explained 
later. Rows of biased estimates are indicated by B while the rows of unbiased 
estimates are indicated by U. Parentheses are used to indicate entries which, 
while appearing in rows of biased estimates, arc actually unbiased. The exact 
values of 2 x', when they appear, are indicated by underscoring. The Roman 
numerals indicate the different frequency distributions while the grouping 
methods are indicated by the values of ( 1 ), ( 2 ), (3), (4), (5) above The true 
values are 2 x = —129,2x 2 = 77,591, 2 x 3 = — 52,005, 2x 4 = 09,239,951 where the 
values of x used are the values of the original variates decreased by 105. 

The information contained in Table I deserves more than cursory examination. 
Study shows that the estimates resulting from method 2 are much closer to the 
true values than are the estimates resulting from method 1, etc. Table II is 
presented below in order to facilitate the comparison of the relative amounts of 
grouping error involved in the different methods. The standard deviation of the 
grouping error of the conventional method, method 1 , is used as a norm and the 
standard deviations of the grouping errors for the new methods are compared 
with this norm. 

The decline in the size of the error revealed in Table II indicates a decided 
decrease in grouping errors. Grouping method 2 enables one to compute the 
mean exactly and this is always possible when method 2 is applied to discrete 



TABLE I 


Biased and Unbiased Grouping Estimates by Different Methods 


Grouping 

Grouping 

Method 

Zx 

-129 

2a* 

77,591 

Zx* 

— 52,005 

\ 

\ l*' 

j W.2W 951 

i 

l-B 

(-181) 

77,149 

i -134,101 

j 69,003,205 

I 

2-B 

(-120) 

(76,593) 

-105,105 

68,207,577 

1 

3-B 


(77,591) 

(-77,825) 

68,033,057 

I 

i-ir 

-181 

75,522* 

-130,571 

66,023,177 

i 

2-ir 

-120 

76,693 

-10-1,245 

60,735,717 

i 

3-U 

— 

77,591 

-77,825 

67,516,383* 

i 

4-tr 

-129 

77,663* 

-49,513 

09,001,817 

I 

5-U 

-129 

77,591 

-52,351 

09,103,537 

II 

l-B 

(-218) 

78,466 

-5-1,602 

74,519,002 

ii 

2-B 

(-129) 

(77,181) 

-52,977 

72,290,307 

II 

3-B 

— 

(77,691) 

(-52,465) 

70,085,801 

II 

1-U 

-218 

76,839* 

-50,242 

71,427,194 

II 

2-0 

-129 

77,181 

-52,117 

70,752,717 

II 

3-0 

— 

77,591 j 

-52,465 

70,108,527* 

II 

4-0 

-120 

77,522* 

-52,307 

03,770.100 

II 

5-0 

-129 

77,591 

-53,060 

00,240,172 

III 

l-B 

(-111) 

77,769 

| 

2,889 , 

71,105.4(H) 

III 

2-B 

(-129) 

(76,797) 

-17,037 

70,053.0113 

HI 

3-B 

— 

(77,591) 

(—36,097) 

09,211.515 

III 

1-0 

-Ill 

70,142* 

6,109 

OK, 100,521 

ill 

2-0 

-129 

76,797 

-10,177 

08,517.153 

ill 

3-0 

— 

77,591 

-35,097 

08,097.271* 

III 

4-0 

-129 

77,451* 

-59,409 

08,945,609 

III 

5-0 

-129 

77,591_ 

-51,291 

00,20i.ion 

IV 

l-B 

(-139) 

79,747 

-23,311 

74,171,443 

IV 

2-B 

(-129) 

(77,790) 

-34,464 

72,095,004 

IV 

3-B 

— 

(77,591) 

(-44,108) 

70,592,774 

IV 

1-0 

-139 

78,120* 

-20,531 

71,027,435 

IV 

2-0 

-129 

77,790 

-33,604 

70,639,801 

IV 

3-0 

— 

77,691 

-44,108 

70,075,500? 

IV 

4-0 

-129 

77,459* 

-59,350 

09,037.511 

IV 

5-0 

-129 

77,591 

-52,243 

09,248.077 

V 

l-B 

(-104) 

81,934 

19,060 

70,143.874 

V 

2-B 

(-129) 

(78,891) 

-4,621 

73,690,207 

V 

3-B 

— 

(77,691) 

(-28,387) 

71,010,053 

V 

V 

1-0 

2-0 

1 1 

l—t 

§2 

80,807* 

78,891 

21,746 

-3,661 

72,912,380 

72,012,387 

V 

3-0 

— 

77,591 

-28,387 

71,092,779* 

V 

i-U 

-129 

77,474* 

-55,475 

09,142,430 

V 

5-0 

-129 

77,591 

-51,932 

69,312,700 


140 


GROUPING METHODS 


141 


TABLE I (Cord'd.) 


Grouping 

H9 

Xx 

-129 

i* 1 

77,591 

Zac* 

-52,005 

Xx* 

69,239,951 

VI 

1-B 

(-87) 

80,146 

16,551 

72,467,541 

VI 

2-B 

(-129) 


-4,914 

70,940,124 

VI 

3-B 

— 

(77,691) 

(-27,714) 

69,902,910 

VI 

1-U 

-87 

78,518$ 

18,291 

69,307,613 

VI 

2-U 

-129 


-4,054 

69,379,624 

VI 

3-U 

— 

77,691 

-27,714 

69,385,636$ 

VI 

4-U 

-129 

77,641$ 

-50,424 

69,536,657 

VI 

5-U 

-129 

77,691 

-51,849 

69,241,607 

VII 

1-B 

(-52) 

80,302 

-36,118 

71,851,930 

VII 

2-B 

(-129) 

(78,168) 

-39,486 

70,515,354 

VII 

3-B 

— 

(77,591) 

(-44,492) 

69,647,462 

VII 

1-U 

-52 

78,675$ 

-35,078 

68,685,722 

VII 

2-U 

-129 

78,168 

-38,626 

68,951,994 

VII 

3-U 

— 

77,691 

-44,492 

69,130,188$ 

VII 

4-U 

-129 

77,660$ 

-48,802 

69,689,930 

VII 

6-U 

-129 

77,691 

-51,136 

69,260,146 

VIII 

1-B 

(-89) 

78,653 

-101,357 

68,426,497 

VIII 

2-B 

(-129) 

(77,352) 

-82,416 

67,969,816 

VIII 

3-B 

— 

(77,591) 

(-65,788) 

67,944,476 

VIII 

■ 

-89 

76,926$ 

-09,577 

65,330,249 

VIII 


-129 

77,352 

-81,556 

66,412,776 

VIII 

. 

— 

77,591 

-66,788 

67,427,202$ 

VIII 

I 

-129 


-47,114 

69,711,437 

VIII 

■ . 

-129 


-51,473 

69,210,235 

IX 

1-B 

(-180) 

78,894 

-180,792 

73,155,160 

IX 

2-B 

(-129) 

(77,617) 

-134,865 

71,407,737 

IX 

3-B 

— 

(77,591) 

(-92,169) 

70,183,341 

IX 

1-U 

-180 

77,267$ 

-177,192 

70,045,262 

IX 

2-U 

-129 

77,517 

-134,005 

69,857,397 

IX 

3-U 

— 

77,591 

-92,169 

69,666,666$ 

IX 

4-U 

-129 

77,766$ 

-45,501 

69,323,762 

IX 

5-U 

-129 

77,591 

-62,704 

69,246,016 


data. There is also a corresponding decrease in the errors of the higher powers 
to roughly one-half, two-thirds, three-fourths. Greater precision in the case 
of the higher power sums can be obtained with the use of the other methods, 
though these methods demand more calculation. 

There is one more question which should be discussed before the general 





































142 


p. B. mVYEIt 


theory is presented, and that deals with the method of compulation of the 
quantities to" -1 , Sx 2 x" ~ 2 , in methods 2 and 3. Computational techniques are 
discussed in a later section of the paper, hut enough should he given now to 
make the meaning of Z7x"and 2xV"" 2 clear. In getting Ex", we recall, 
we need only the values of the class mark, ;r' and the frequency associated with 
each, fa . To get the values of Ilex" 1 we need in addition to x' the sum of 
the x values which arc grouped together in the class having t he same class mark, 
x'. We denote this value by x,- and we use this instead of the /,< of the usual 
method. In the case of method 3 we record x’< where .r*- is the sum of the squares 
of all x values having the same grouped value x'. 

Let us examine the first grouping in Table I. The original 211 variates were 
recorded hy Carver [l, 154] and he gave the values of /*• for each grouping. 
It is necessary for us to return to these original variates, but instead of counting 
the variates in a given group, we add them and we add their squares, 

In obtaining the values for the first grouping in Table I the variates were 
transformed with the use of x — v — 105. The variates then ranged from 


TABLK II 

Standard Deviations of the Grouping Errors of the Different Methods Expressed as Percentages 
of the Standard /Jcmutioo# of the Csual .Method 


Method 

Zx 

1 


r*» 

1 

100 

100 1 

1(10 ! 

100 

2 

0 

48.0 j 

i 05.5 ; 

74.3 

3 


0 

, 32.1 

49.1 

4 

0 

8.8 

7.3 j 

13.8 

5 

0 

0 

9 ! 

1.5 


—41 to 50 and the frequency distribution was made with mid values x' ** —37, 
—28, —19, —10, etc. The values of f x <, xy , and went then computed and 
recorded in the columns, 2, 3, 4, of Table III. The next three columns are 
computational columns useful in obtaining the biased estimates recorded at the 
bottom of Table III and also in Table I with the use of £ x"f I > , 


XX" 1 = ^2 X t ‘X'’~ 




3. General formulas for corrections for grouping bias. We, art* next led to 
the question of correcting these estimates of Sx* for the bias introduced hy 
grouping. Before indicating the numerical work, we derive general formulas 
for correction for grouping bias. 

Wo assume that the variates are recorded in units of It which means that, in 
the case of discrete series, the smallest possible difference between any two 
unequal variates is equal to h. In case the distribution is continuous, the 
recorded values constitute a discrete series recorded in units of h. Thus heights 
may be recorded to the nearest inch, in which case h is one inch, or to the nearest 





GROUPING METHODS 


143 


one hundredth of an inch, in which case h is 1/100 inch, etc. We. assume 
further that all possible groupings of k different values are made. Thus if the 
smallest variate is a, then the values of x — a, a + h, a + 2h, * • • , a + k + ih, 
■ • ■ ,fl + k —~1 h are thrown in a group with class mark rt + %(k — 1 )h. The 
k possible sets of groupings of k are made in this way. 

We examine the error involved when a specific variate x is replaced by the 
class mark x' in each of these groupings. The values of the lower open limit, L, 
the upper open limit, U, the class mark, x', and the error e = i - i' are in¬ 
dicated in Table IV. The k different groupings indicated by the different rows 
show x at the lower limit, x one step above the lower limit, x two steps above 

TABLE III 


Values of x', f t > t x x ' and xV for the First Grouping with Computation of Biased Estimates 

of7tx'< 



fxi 

Xi, 


*'* 

xi 

*’■ 

63 

1 

50 

2500 

2809 

148,877 

7,890,481 

44 

1 

48 

2304 

1936 

85,181 

3,748,090 

35 

8 

287 

10351 

1225 

mmm 

1,600,025 

26 

16 

402 

10190 

676 

w8m 

156,970 

17 

27 

475 

8515 

289 

4,013 

83,521 

8 

45 

378 

3426 

04 

612 

4,000 

-1 

63 

-73 

369 

1 

-1 

1 

-10 

41 

-386 

3924 

100 

-1000 

16,000 

-10 

27 

-507 

9701 

301 

-6859 

130,321 

-28 

12 

-338 

0594 

784 

-21,952 

614,656 

-37 

13 


10717 

1369 

-50,(353 

1.874,101 

zf x 

244 

-129 

77591 




2x'E I 

-181 

76,593 

-77825 




2x' s E J! 

77,149 

-105,105 

68,033,657 




2 x‘*F, 

-134,101 

68,267,577 | 




Zx'^F, 

60,063,265 

I 





the lower limit, etc. It is at once apparent that the errors in replacing x' for x in 
the k different ways constitute the deviations from the mean of tile rectangular 
distribution h, 2 h, 3/i, • ■ ■ , kh. We indicate the moments about the mean of 
this rectangular distribution by Ry , Rt, R 3 , R< and we use the notation E{t ) 
as the sum of the <th powers of the k different «’s divided by k . It follows that 
E(e) = Ri . Now the values of lit are 0 when l is odd and are well known when 
t is even [2,325]. The ones in which we are especially interested are 


R, = 


k - 1 


h 2 and 


(fc 2 - l)(3fc 2 - 7) 
240 


h\ 


( 2 ) 


12 






















































144 


I>, a, DWYER 


If an adjustment of scale, is made so that the differences between successive 
class marks are unity, as is customary, the value of h is 1 fk. The values of 
R 2 and R { are then 


(3) 


p _ 1 - 1/it 2 „ (I - 1/* 5 K3 - 7 

lU 12 ’ * 240 


As the number of groupings increases the value of 1/k 2 0 and the appropriate 

values of the moments of the continuous rectangular distribution result,. Thus 


Ri 



Ri 


80’ 


and 0/fi - Hi 


i 

240* 


TABLE IV 


Open Limits, Class Marks and Snort for the Different Groupings 


Group¬ 

ing 

* 

L - 

0- 


t X — 

i 

x =* L 

X 

x+ (k~l)h 

* | 


2 

X- L + h 

x - h 

x + (k-2)h 

t +Hk- 3)A 


3 

x «■ L + 2h 

x — 2h 

;* + (fc-3)A! 
. ! 

x -H(k—6)/i 


... 

... 

... 

: 

1 

I 

... 

1 

* »» L + t— lh 

* — i — l h 

i + ( k—i)h 

x +i[fc-(2i-l)]k 

— i(fc — <2* — 1)]A 

... 

■ 


... 

i 

1 

... 

k-1 



x + h 

x~i(k-3)h 


k 

* - V - L + 
(fc-l)fc 

x-(k-l)h 

X 

x ~i(k—1)A 



If now we let F t be any real function of x defined for the values * *» a, o + A, 
a -b 2k, • • • , we have at once the useful lemma 

(4) £?[2x a e' Fj = Sx°F»E[e‘] « R<Lx*F* . 

This results from the fact that the values of x, and of all functions of x, are 
unchanged by the groupings even though the values Of x' and t vary. 

The 2 in (4) indicates a summation with respect to the variates while the 
summation with respect to the different errors is taken care of in the E notation. 
The limits of the 2 in (4) are purposely left indefinite so that either a serial 
or a frequency notation can be used, Thus if a serial notation is used, the 
limits are from 1 to N, the values of x are the variates x< and F» becomes F, t . 
In this case F I{ may be set equal to unity to give the corrections of method 1, 
may be set equal to x { to give the corrections of method 2, or may be set equal 
























GROUPING METHODS 


145 


to x x to give the corrections of method 3. In case the notation of the frequency 
distribution is preferred, the limits of the summation are the smallest variate 
and the largest variate, the values of x are the values of the different variates 
which occur. In this case we may have F x — , the frequency function, 

F t = xft< « av , or F z = x% ~ 

The continued application of (4) to the terms in the expansion of 
results in 


(5) 


ME ■*"*’*] = B[Z (x-*YFt] = 

= E (-1)' (*) E x‘-‘F x E(e‘) = E (-1)' (") Rt E 


The fact that R t — 0 when t is odd may be used in writing out the expansion. 
It is possible to work out a more general theory where the class mark is some 
other value (say the smallest variate) rather than the mid-value. In such a 
case formula (5) would apply, but the values of R t would be the values of the 
moments of a rectangular distribution rather than the central moments. The 
above formula is sufficiently general for the purposes of this paper, 

Specific values of (5) when s = 0, I, 2, 3, 4 are 

E[SF X ) = 2 Ft 

E{Zx'F x ) = 2 xFt 

(0) E[Zx a Ft] = 2 x*Ft + R^F t 

E[2x , 3 F x \ rn Sx 3 F z + 3R,SxF x 

E[Zx'%\ = 2 x*F x + m2x*F x + RiSFt . 

These equations can be solved for 2 F x , SxF x , etc., in terms of the expected 
values. If we use the inverse operator and write = A instead of E[A\ 

— B we have 

E~ l [2F z ] m 2 F x 

E~'[SxF x ] » 2 x'F x 

E- l [2x*F*\ = 2 x'*F z - RtXFt 

(?) E~'[2x s F x ) = 2 x*F z - mZx'Ft 

E~ l [Zx*Ft\ = 2 x ,l F, - mSx*F z + (07?! - R A )2F X and in general, 

E- l X x*F z ] = E *'‘Ft - E (-D'OiJidE x~ l F z \. 

l -3 

These values JS -1 [2x*F I ] are unbiased estimates of 2 x’F x since 

ElE^iSxFt]} = S.v'F z 



140 


i>, s. mvYEit 


The corrections for method 1, the customary con ecfions, are obtained if a 
serial notation is used with F, = 1. Tim commfirms for method 2 are obtained 
if a serial notation is used with F x - x. Tim corrections for method 3 are 
obtain d with F r — x‘. Thus we have 



ir'iN) - 

N 



E~ l [Xx] « 

Ex' 


(8) 

fir 1 [ex 2 ] = 

Ex' 2 - 

&.V 


£T’[Ex 3 ] - 

Ex' 3 - 

Ex' 


JT^Ex 4 ] - 

Ex' 4 - 

fi/fjEx' 2 4- ml 


E~ x [Xx} = 

Ex 


(9) 

jS _1 [Ex 2 ] - 

Exx' 


£ -I [Sx 3 ] = 

Exx' 2 - 

- RiXx 


i?“ l [Sx 4 l = 

Exx' 3 - 

- 3/fjEx.r' etc., 

and 





U“*[2x 2 ] « 

Ex 2 


(10) 

E~ l [Ex 3 ] - 

: SXV 



iT'lSx 4 ] = 

‘ Ex ! x' a 

- RzXx 3 . 


These formulas are the ones used in obtaining the unbiased estimates in methods 

tj* i 2() 

1, 2, 3 from the biased estimates in Table I. In this case ifj ■ 0 » -■ 

6 Rl - Ra = = 188, N • 2-14 and the values follow by 

240 

direct substitution in (8), (9), (10) above. 

4. Compound grouping formulas. So far nothing has been said about the 
calculation of the results by methods 4 and 5. These methods might be called 
compound grouping methods, since they utilize the biased results of more than 
one grouping method. The values of Ex" and Exx""' 1 are needed for method 4 
and the values of Ex", Xxx"~\ 2xV'“ 3 for method 5. The formulas for method 
4 are first presented. The argument is given in some detail for the value of 
HT^Ex 2 ]. Now 

Ex 2 = X(x' -+- «) 2 = Ex' 2 + 2Sx'« + Xt 1 

= Ex' 2 -f 2Ex'(x - x‘) + 2t 5 

so that 

Ex 2 = -Ex' 2 + 2Exx' + Xt. 



GROUPING METHODS 


147 


If the values of « are known, we would have the. exact value of 2a: 2 since we know 
2x and —xx • How over, we do not know' these values of e from a single group¬ 
ing, so we derive a formula giving unbiased estimates of 2x 2 . We have at once 

E[zA “ 2x* - /?[-2x' J + 22xx'] + NR, 

and since NR, K[A r /^] we. have 

- A'[-2x' 5 + 22 xx'+ NR,} and 

K '[Sx s ] - -2x' 5 4- 22xx' + NR,. 

There is a relatively small error in this estimate since the only error involved 
is tin: difference between NR, and the actual sum of the squares of the c’s. 
This formula is the basis of the values of £T'[2x 2 ] recorded in Table I under 
method 4. For example, in grouping I, the estimate is -77149 4- 2(76593) 4- 
244(||) =» 77663$ and this differs by only 72$ from the exact value. 

In a corresponding manner, we may prove 

K '[2x s ] - -22x'’ 4- 32xx /l 4- 3fl 2 2x 

( 12 ) , . 

R ‘(2:x 4 1 » ~32x' 4- 42xx'* 4- Bfotf-'px 2 ] 4- ZNRi. 

Different values of E ‘[2x s ] can be used. In the calculations of Table I the 
values E '[Ex 1 ] at 2xx' from (9) were used, but the values £r'[2x 2 ] — — 2x' 2 
4- 22xx' 4* iVfta could be used to give somewhat better results, 

It can be shown also that 


2r‘ts**l - — 42x'* 4- 52xx“ 4- lOfo/r'^x 2 ] + 15«<2x; 

/r l [2£*] ** —52x' 6 4- G2xx' 5 4- \hR,R^[Zx} 4- 45fl < ff _, [2x 2 ] 4- 5ft„N, 


and, after some argument that 


JT'E x*] 
(14) 


-(« - i) 22*" + sEx'-’x 

+ E( 2 S 4 )(at- WuRr 1 Ex*- 2 ']' 


where [$«] indicates the integer $s or $(« - 1). 

It is possible to obtain better unbiased estimates if we use in addition the 
values of 2xV*'\ In this case the values of 2x and 2x s are known exactly, 
and after expansion of S(x' 4- «)', replacement of e by x — x' and of t by (x *- x') 2 , 
and further reduction, we get 

» Sx' 1 - 32xx" 4- 32 xV, 

ZT’^x 4 ] = 32x' 4 - 82xx'* 4- 62xV J - WR*, 

^[Sx 4 ] - 62x' 5 - 152xx /4 4- 102xV* - 15&2x, 

jgr^Sx 4 ] = 102x'“ - 242xx' s + 152xV 4 - 45fl 4 2x J - lONR t , 


(15) 



I-18 


P. a. DWYER 


and in general, 

r* E x'] 

(16) 


-» «s - 1)(« - 2) £ x" - «(« - 21 £ w'* 1 

+ !«(« - l ) £« V - - '£ (*)( 21 ~ >) E /"]. 


Compound formulas involving additional quantities such as £,r\r" *, 2xV* *, 
etc., can be worked out by the methods outlined above. 


6. Computational methods. It has been shown in sections 3 and 4 how the 
unbiased estimates can be obtained from the biased estimates. It is the purpose 
of the present section to show how these biased estimates can be computed 
efficiently. One method of calculation was shown in Table III, The values 
of /*<, x*. , xl> were computed and recorded, and the resulting power sums 
obtained. This is the most direct means of computation and if the number 
of groups is small and if a modem computing machine equipped with auto¬ 
matic positive and negative multiplication is available, it may be the, preferred 
method. It should be noted that the values of xl in Table III are obtained 
most easily with the use of a machine which permits the calculation of the 
square with a single key punching operation. 

It is customary to use the devices of subtraction of a constant (either a central 
class mark or the smallest clans mark) and division by a constant fsize of the 
class interval) to simplify tho computational work. Thus in Table III wo 


could use the transformation d' 


- (-37) 
9 


and compute the values of 


2 d'‘F „• . If F* is the frequency function, we have the usual formulas, but if 
F x is x„i or xl , then the results are terms of the type JS xd" 1 or £z l d’ , ~ 1 . It is 


possible to reduce these to equivalent variables by the use of d « * T (~3?) 

so that the values of 2d", 2dd"~\ 2dV 1_1 result. We then correct for bias 
with the use of the formulas of sections 3 and 4 where the power sums of the 
rectangular distribution are computed with h = l/k. 

Another method which in many cases is preferable to that just described is 
the method of cumulative totals. The values of/,. , x»- , and xl , are cumulated 
successively for the different values of x 1 and the values of the biased grouping 
estimates are obtained immediately from the entries in the last few rows. The 
cumulations of Table III are shown in Table V. The entries in tho column 
of the highest cumulations of /,<, x,<, , with the exception of those at the 

bottom of the column, need not be recorded. 

It is possible to provide multipliers for these entries by an adaptation of 
a method given in an earlier paper [3], A table of multipliers has a top marginal 
row composed of a, a + k, a -f 2k, etc., and a left marginal column composed of 
k — a, 2k — o, etc. The first row in the table is composed of 1 , k — a, (k — a)*, 
(k - a) etc and the first column of 1, a, a 3 , a\ etc. Each entry in the table 
is found by adding the product of the entry above it and the columnar heading 



TABLE V 

Computation of Biased Estimates l ‘sing Cumulative Totals 


GROUPING METHODS 


149 









151) 


I>. S. Iffl'YKH 


to the product of the entry at the left and the urn heading. 'Hie multipliers 
for a — —37 tire .shown in Table VI. 

The diagonal terms are the multipliers of the values of a given cumulation. 
Thus the multipliers of the bottom entries of tin* minimis of earh of the three 
sets of cumulations of Table V are meces-ively 1; -37, •!<>, 13(11). —3323, 
2116; etc. 

This method is ideally adapted to the use of Hollerith cards. The information 
is punched on the cards to the number of plums desired. The computational 
giouping is then accomplished by sorting. As an illustration we take the 


TABLE VI 


Multiplier* when a « —37 and k »> 9 



_i 

-28 

-19 

"10 

46 

i ! 

46 

2,110 

97,336 

55 

-37 

-3,323 

—222,969 

-13,236,055 

64 

73 

1,369 ! 

-50,653 i 

180,600 

-8,756,140 

j 15,798,651 

I 

i 

82 

1,874,161 


! 

! 


TABLE VI! 
Hollerith Illustration 


X n *a X* "4,5 

C(/ t .) I 

C(x„) 

210 

2 

422 : 

200 

6 

1,242 ■ 

100 

10 

2,017 

180 

21 

4,035 

170 

48 

3,727 

160 

110 

18,923 

150 

250 

40,400 I 

140 

458 

70,392 j 

130 

719 

105,431 1 

120 

900 

127,990 j 

110 

980 

137,203 

100 

999 

139,199 

90 

999 

139,199 

80 

1,000 

139,288 


-i 

•1,177,456 


f’tri-i 


1,387 
4,1.5.1 
fi.Ofll 
14,733 
33,675 
76.565 
173,461 
310,073 
•102,771 
613,763 
660,088 
678,200 
078,290 
078,806 


records of the weights of 1000 students as reported by damn- [4] when measured 
to the nearest pound. The value of Sa is 139288 lbs. and that of 2b- 3 in 10,092,450 
(lb.) 2 and we wish to obtain approximations to these values by grouping. If we 
let the grouping intervals be 80-89, 90-99, etc., with class marks x' - 84.5, 
94.5,104.5, etc., we would find by usual methods 2.V = 139,520 Ilia, and 22 k' s « 
19,760,430 (lb ) 2 , However, it is possible to wire in the three place number x, 
and to get from the same number of groups 2x = 139,288 lbs. and Zxx' ~ 
19,727,326 (lb.) 2 . The unbiased values for method (1), (2), or (4) can be com¬ 
puted with the appropriate formulas of sections (3) and (4), 




(juorrixo methods 


151 


Tilt* Holleiith mn i> shown in Tnl»lc VII where the first column indicates 
the smallest vaiiate in the class rather than the class mark. The next columns 
show ("(/*• > und (!(x z >). The fourth column C(y x >) is discussed in a later section. 

The values for method 3 and method 5 cannot he obtained so readily, since 
the quantities to be grouped are the .r s and these do not appear on the card. 
However, it is possible to use a multiplying punch or to use a table of squares 
in the form of prepunched cards t.o get these values of x 2 on the cards. It might 
he preferable, in some eases, to do this work and then to use a coarser grouping 
than Mould be used otherwise. 


6, Moments. The formulas (7), (8), (9), (10), give moment formulas if the 

v_ T 'S 2x' p x v 

proper values of F z are assigned. We let v p = - and v w = - ^— and have, 
in ease F z - 1 /.V in (7) the usual formulas 
1C V) “ >o 
1C '(n:] -- vn -Hi 

(17) v < i , 

F (ml ~ v 3 ~ AviKi 

1C '[ml 'u — bi^/ib f- (li/tla — IF)- 


(18) 


If F z -• .r/A r we have 

1C '(mi] •‘" i Mi 

E '[ms] “ Mil 

1 C ’[mj] — MSI — RiV-i 

K '[mi] - Ms, - ZRiVn . 

While if F t « x 2 /N we have 


E ’W = Ms 
(19) jB"'[Ms] = Mis 

AT'IMi] = M22 — /l’sM 2 

Similar formulas can he written for methods (4) and (5). 

Previous to Carver’s article in 1930 it. was assumed that central moments 
could he used in place of moments in formulas (17) without introducing bias, 
but this article demonstrated that, estimates obtained in this way arc slightly 
biased. Thus 

E{h) - lC(vz - v\) = lC(v{) - E(v{) 

= M2 + Ri — Ml( Ml) = M2 + Rz — 1m2(mi) + Ml] 

= M 2 T" Hi — mj(mi) SO that 
^~'(M2] = h ~~ Ri + M2(mi) 
and so P 2 — R 2 is a biased estimate of p.« . 



152 


I>. S. DWVEH 


The general question of unbiased estimates of the central pmver sums and 
the central moments is one winch has been studied for the conventional ca.se by 
Pierce [3] and Craig [5] The more general discussion resulting from the intro¬ 
duction of the new methods is one which may well be deferred to a later paper. 
It is interesting to note in passing that the estimate of the variance obtained by 
substituting central moments for moments in method C2) is not biased since 

jf?(Pn) — Ji[rn ~ i'imi] — M2 ~ Mi ms - 

It is to be noted that the formulas previously used give correct results 
when the adjustments are defined to make the power sums and the moments 
rather than the central power sums and the central moments unbiased with 
respect to grouping. A sensible method of procedure in such a case is to make 
the correction on the power sum as soon as it is computed. 

7. Product moments. Correlation. The introduction of additional variables 
opens up a variety of situations, since each of the variables may be grouped in 
different ways. Of these situations, one is immediately solved with the use 
of the formulas of section 3, and that is the. case when one. of the variables is 
not grouped. Let y be the ungrouped variable and let ~ y t ' be the sum of 
all the values of y having the same x grouped value, x'. This situation is 
frequently encountered when using Hollerith cards, as it is only necessary to 
wire in the whole variable y and take totals when the smallest value of x in the 
group is attained. Thus in Table VII, the, value, of f'(fb') can be obtained 
simultaneously with the value of C(/*<) and C(av)- Additional cumulation 
C(z x -), C(uv)j etc., could be obtained at the same time. It follows from Table 
VII that 

Zy = 078,896 

E~ y Z[xy] = Zx'y = 9-1,929,322. 

The actual value of 2 xy is 94,774,336. 

The general development of the theory of unbiased estimates of product 
moments is too extensive to be inserted here, but a brief outline, might be in¬ 
dicated, We let the grouping errors be e = x - x' and *j = y — y’. Then the 
generalization of the lemma (4) is 

(21) IE *“ f V F x G v = L *° F<\ L i ( G v , 

xy xy 

where R b0 is the bth central moment of the rectangular distribution consisting 
of «’s and fton is the dth central moment of the rectangular distribution con¬ 
sisting of tj’s, This is applied in turn to the terms of the expansion of 

Zx ,r y"F x G ]l . 

For example 

E[Zx'y'F x G v ] = EZ(x - e)(y - V )F Z G U 

= ZxyF x G v - R l0 ZyF x G v - R oi ZxF x G v + i?,off 01 SF t (7„ , 


(22) 



GROUPING METHODS 


153 


and if F x - 1 and G u = 1, wo have 

(23) /i[lVi/'] = Zxy so that £ -1 [2x?/] = Zx'y'. 

If we Um: the customary device of correcting the moments for bias, rather 
than the central moments or tin* ratio which is the correlation coefficient, we 
have the usual formula for correction of the correlation coefficient in which 
the numerator term is not corrected for bias, but the values in the denominator 
are corrected. 

The use of method 2 gives Zx'y' and Zxy' as unbiased estimates of Zxy. 
It has been pointed out that these quantities £x'j / and Zxy' are readily obtained 
when the actual values of x and y are punched on Hollerith cards Each is 
in general a better estimate of Zxy than is Zx'y' since one of the values in the 
product, in each case, involves no approximation An average of these might 
be taken to obtain a better estimate of 2xy. If the values of Sx' and Zy 1 are 
also available, it is preferable to use the formula 

(24) JT‘[r] = - where A zv = NZxy - (2 x)(Zy). 

V Ai'i'Ak 1 

The 1000 cards of weights and heights were used in this way with the digits 
grouped There resulted (dimensions omitted) 


N = 1,000 
Zx - 139,299 
2x' = 139,620 
2xx' = 19,722,326 
Zxy' - 94,848,036 


Zy = 678,896 
Zy' = 679,420 
Zyx' = 94,929,322 
Zyy' = 461,885,052 


which gives B~ l [r) — .4957. The ungrouped 4 place value is .4952. 

For use without Hollerith machines, this method indicates the recording of 
the values of and x„< as well as /<»»> for each entry in the correlation chart. 
The generalization of method 4 leads to 


Zxy = Z(x' + i)(y' + v) = — 'Zx'y' -f Zx'y + Zxy' + Zti) 


so that we have 

(26) FT 1 [2x2/3 = -Zx'y' + Zxy' + Zx'y. 

It is to be noted that the quantity Zx'y' is the unbiased estimate of Zxy 
resulting from the usual frequency distribution. This formula can be used 
with formula (11) of section 4 to obtain an estimate of the correlation coef¬ 
ficient. 

The correlation chart application of method 4 demands the triple entry 
/*v , V > V*' for each of the squares of the correlation chart. From these 
values it is possible to compute all the entries needed to use method 4. 



1,5-i 1'. S. DW1EH 

In general the values of E[Zx' r t/' , FJ!y] run he workid mu with the repeated 
use of lemma 21. The reader who underetand- tile dcvelopm.-ist- of M-etion*. 
3 and 4 should have little difficulty in writing out the formula*- n -ultmg lieie. 

It should he pointed out that, in eases where the lir-t and second order mo¬ 
ments only arc desired, it is frequently advisable In avoid pumping hv Using 
modern computing machines and, in this way, to eliminate the trouble and the 
errors caused by grouping [15], 

8. Conclusion. There are additional points which might Is* considered, hut 
they would take considerable, space and the presentation is now sufficiently 
complete to enable one to obtain some perspective on the piojrer use of the 
new methods. 

If precision is not needed, the use of the former grouping methods is advised. 
But if additional precision is needed, anil if the results of a single grouping 
only are available, it is advised to tt.se the newer methods. Method 2 i*- much 
more satisfactory than method 1 and, in many eases, will he sufficient, but, if 
additional precision is demanded, one can Use method 3 or one of the com¬ 
pound methods 

In general there are two kinds of groupings, One is a recorded grouping, 
and expresses the measures in terms of tlu> units which are desired, while the 
second is a computational giouping which is introduced fur the purpose of ease 
of computation. Now the recorded grouping, no mutter whether obtained from 
discrete or continuous data, is necessarily discrete, Thus the weights, when 
measured, have to be recorded to the nearest pound, or tile nearest tenth of a 
pound, or to the nearest hundredth of a pound, etc. The formulas to he applied 
to the results of computational grouping are the foimuhis for di-m*le variates. 
If in addition one wishes to correct continuous data for the recorded grouping, 
he may then apply the usual Sheppard’s corrections for continuous data. How¬ 
ever, it is advised to make the recording grouping sufficiently detailed so that 
the errors are slight. Thus one might record the values of heights to I he nearest 
tenth of a pound, but use ten-pound intervals in making calculations. In this 
case the values when corrected for the computational grouping (to the nearest 
tenth of a pound) would presumably he sufficiently precise so that the additional 
grouping for recording would not be necessaiy. (In many eases the two group¬ 
ing corrections arc combined in a single grouping correction for continuous 
data) 

It appears that it is not sufficiently satisfactory to continue to record the 
results of grouping in the usual form of a class mark (or class limits) and a fre¬ 
quency if the results are to he used by others. The table should include an 
additional column of x x > and preferably a column of xl- , where the x,' are the 
computational grouped values and the x are the measured values recorded to 
a considerable degree of accuracy. The arrangement takes little more space 
than the present frequency distribution, and it can he obtained from the recorded 
values with a reasonable amount of additional work. In the case of correlation 



MUTING MKTIIODS 


155 


it in mmk 1 tl»t the jitwnt grouping of frequencies in the correlation chart 
he augmented with the vrIuhi of jy and for each square. In this way it 
is pofMiblr for those who may use the distributions later to obtain much better 
estimates than would k fumble from the frequency distributions as now 
nwW. This point certainly should be considered by all those who prepare 
tables for general line, anti yet are forced by practical considerations to use some 
sort of grouping in refuting the results. 

REFERENCES 

Jl| H. t\ t’uim, “The fundamental nature and proof o( Sheppard's adjustments/' Amals 
0 /M %f,Ynl7|WW l pp 15-1-103 

jj) J A Pntant, "A study n( a universe of n finite populations with application to moment* 
funrtion adjURUrirnta for grouped data/’ Anna/* of Math, SlaL, Vol, 11 (1940), 
pp 311-334. 

|JJ P, ft. Item, 1 Tlit* computation of moments with the use of cumulative totals/’ Annals 
tfM tint, Vol, 9 (IPt, pp. 288 303. 

[4j H, V f arveh, Aflfhopmlnc Dala, Edwards Biothers, Ann Arbor, Mich., 1941, 

|5j C, (!. (“suits, “Note m Sheppard’s corrections," Armafs of Mali if,, Vol 12 (1941), 
pp. m ;m 

[§] P. S, l)wm, "Thr rahlition of correlation coefficients from ungrouped data with 
modern calculating machines/' Jw. of Am, Slat, Assn., Vol. 35 (1940), pp 
0710 

|?j It h JoNr«, "The up of grouped measurements/’ Jour, oj Am. SlaL Assn., Vol 30 
W, pp. 525 529 



ON THE CORRECT USE OF BAYES' FORMULA 

By R. v. Misbs 
Harvard Unircmiy 

The problem that we try to solve by using Bayes’ formula consists in making 
an. inference from an observed statistical value upon the unkrimt n value of a 
parameter, and in examining the chance of this inference being correct. One 
may call this the principle problem of practical statistics or the estimation 
problem, or, as the author put it in German (RueckschlusK-Wahrschoinlichkeit) 
problem of inference probability; at any rate we encounter this kind of problem 
in various forms in almost every branch of statistical investigation. It will be, 
convenient to base the following discussion on a concrete question in quite 
specified form which will allow us to see clearer the points that art' to lit* stressed 
in this paper. 


1. The problem. In examining the quality of water supplies with respect, 
to the number of bacterias of a certain kind they contain, a definite procedure is 
usually adopted. One takes n - 5 samples out of the water, each sample of 
exactly 10 ccm. Then by a certain biological test one finds out whether or not 
each sample contains at least one bacteria of the kind under consideration. The 
number x (zero to five) of positive tests is the observed value from which an 
inference is drawn upon the probability 0 for a sample containing at least one 
bacteria. It is assumed that this 6 is connected with the average number X 
of bacterias per 10 ccm by 

(1) 0 = 1- <f x ; 0 = 0! = 0.G3 for X * 1 

according to Poisson’s law. 4 particular question which we want to answer is 
this: What is the chance of being right, if we conclude from the observed fact 
x = 0, (in other cases from x = 1) that 0 lies between 0 and Oi — 0.63 (or X 
between 0 and 1)? 

For a given 0 the probability of getting x positive tests out of n tests is 
according to Bernoulli’s formula 


(2) p(x\9) m - <?)"“*, 

The chance of having a 0-value between 0 and 0 t when x positive tests are ob¬ 
served is according to Bayes' formula 

/ p(x 1 0 ) dP(e ) 

PM = *- 

J o p(x 1 0 ) dP(e) 

156 


(3) 



BAYES’ FORMULA 


157 


where P(5) is a distribution function, monotonically increasing from 0 to 1 
and usually known as the a 'priori probability. 


2. The apriori. The function P(6) is generally considered as a troublemaker. 
As one uses to call P the a priori probability most people think that it has some¬ 
thing to do with those absurd conceptions of non-empirical, a priori known 
probabilities that cannot be tested by any experiments etc. This cannot be 
strongly, enough refuted. In our particular case the meaning of P(5) is the 
following. Each probability statement refers, as we know, to a certain infinite 
sequence of experiments or trials which form a kollektiv. If we ask for the 
chance P*(5i) of having a 5-value between zero and 5, when a certain x has been 
observed, we have in mind a sequence of trials each consisting of two steps, 
first, picking out one particular water supply, and then testing the number x 
of samples that contain bacillas. Among the first N trials of this kind we shall 
have Ni cases where the 5-value for the water supply picked out lies between 0 
and 6 ,, then we shall have AL cases where the number of positive tests is x, 
and finally in a number Nu of cases both conditions will be fulfilled. The 
chance P x (di) we ask for is then by definition 

(4) F,fa) - lim&, 

*-■> N x 

while the so-called a priori probability is 

(5) P^) = lim £ l . 

N-* N 

Later on we shall also use the probability 


( 6 ) 


lim 

AT-fOfl N 


All these magnitudes are to the same extent empirical or non-empirical. They 
are “empirical,” since we get approximate values for them out of a long sequence 
of experiments, and they may be considered as something super-empirical since 
the concepts of an infinite sequence and of a limit are used in the definition—as 
each theory must involve a certain amount of “idealization.” 

In order to avoid the above mentioned equivocation the author had sug¬ 
gested a long time ago 1 to call the probabilities corresponding to P(5) and P*(5) 
respectively the initial and the final probability. Another expression which 
could be used in connection with the distribution function P(5) is overall distri¬ 
bution, since it means the distribution of 5-values within the total mass of 
samples, not regarding what the values of x are in each case. 


3. No randomness required. Now, the first remark we have to make is the 
following: In the Bayes’ formula (3) the existence of a function P(5) is presup- 


1 Cf. reference [2], p. 162. 



158 


R. V. S1I8EH 


posed, i.e. we assume that in the sequence of successive trials the frequency of 
those cases in which 6 falls into a certain region has a definite limit. But 
nothing is assumed about this limit being independent of a place selection. 
The sequence of trials must fulfill the first condition of a kollektiv, with respect 
to 6 but not the second; in other words the randomness in the surrmfrm of Values 
is not required. Thus we may say that 0 is not supposed to be a chance variable 
in the usual sense of this term. Sometimes people are shocked by the idea that, 
in Bayes’ theory the individual cases are supposed to be. picked out at random, 
and it is often considered as a superiority of the method of confidence intervals 
that here such assumption is avoided. 

It is true that in the latter method even the existence of the frequency limit is 
not required, 2 but this does not seem to make any essential difference. The 
fact is that, if we want to make an inference upon the value of 6 i.e. an assertion 
about the chance of 0 falling into a certain interval, we have to assume that in 
the long run different 0-values may occur with certain frequencies. 

It may be useful to have different expressions for the two cases where a fre¬ 
quency limit is or is not supposed to be independent, of an arbitrary place 
selection, As we use the word probability in the first case it acorna suitable to 
apply the word chance in the second. Thus, if P(0) is the initial or the over all 
chance of g we would say that P x (0,) is the final chance of 0 being smaller than 
or equal to 0, for a certain observed x-value. When P(0) is supposed to tie a 
probability, i.e. to fulfill the condition of randomness, then P,(0,) will have this 
property too and has to be called probability. 


4. Inequalities for the final chance P,(0), A much better founded objection 
against the practical application of Bayes' formula consists in saying that in 
most cases we haye no sufficient information about the function P(6). Thin 
undeniable fact leads often to an incorrect simplification of the formula by re¬ 
placing in it dP(6) by de which means an a priori probability of constant density. 
It is obvious that this is no solution: if you do not know what P(6) ia, to assume 
it equal to 6. On the other hand, if we accept Bayes’ formula as correct (and 
there is no reason for not doing bo) we learn that the value P,(0) we aak for 
depends essentially on P( 6), and is undetermined as far as P(0) is undetermined, 
lhe only consequence in this situation is, first to use all information we can get 
about P(0), and then to make the answer as vague or undetermined as the in¬ 
completeness of this information requires. 

One way to do this consists in setting up inequalities for P,(B) baaed an 
certain inequalities for P(0). A formula which turns out to be useful, at least 
in a well-known asymptotic problem is the following: 

Let us consider the general case where 6 stands for several variable parameters, 

3T -- A . be the set of a11 possible values of 5. We are interested in the final 
probability P c of a subset C of A given by 

1 Cf, reference [4], p. 201. 



BAYES 1 FORMULA 


159 


(1)5 


[ p(x | fl) dP(0) 
i> J (o 

1 r ’ 

I p(x 1 8) dP(0) 

J (X) 


where .r is supposed to tie known. 

Let V', be thevnlueof }\ under the assumption of a constant initial density 
and denote by P« , Pb the analogous values for a subset B which includes C so 
as to have 


(7) C < B < A. 

The quantities P'u and P(. depend only on the function p(x [ 8) and the sets B 
and C while Pa and P c change with P{8). 

If wc assume that the initial density p(8) has the limits 


( 8 ) 

it can easily 
(9) 


m g p(0) £ M 
m' % 7)(0) g ilf 
he shown that 

»t r / , hi' f, X <• 

q Ph T ^ (1 - l a) s 


within B 

within A — B, (A minus B) 


Pb + — (1 - Pi) 

Pc m m 


Wo may consider the following application of these inequalities. 

If wo arc concerned with a case where a great number n of trials is involved, 
the function p(x \ 8) which determines the P' values—shows an increasing con¬ 
centration at a certain point of the set A, In other words, for large n we have 
a subset B more and more reducing to ono single point for which P u is as near 
to 1 aa we want. If wc then assume that the density p{8) is continuous and 
bounded, the difference between m and M tends to zero, and if m is supposed to 
have a positive lower bound, both the first and the last expression in (9) tend 
to unit or Pc approaches P' c . This is a generalized form of the statement 
which the author proved for the first time in 1919, 3 that in the original Bayes’ 
problem where we are concerned with n repetitive observations of an alterna¬ 
tive, the final probability becomes more and more independent of the initial proba¬ 
bility P{0) as the number n of observations involved increases. 

6. Using previous experience. The inequalities (9) may be of use in many 
cases. Hut to be sure, in general, they are not the basis upon which practica 
estimation judgments rest. Everybody acquainted with the conditions of test¬ 
ing water supplies takes it for granted that the outcome x = 0 (no positive test) 
supplies a sufficient reason for the statement 9 g 8i = 0.63 (less than one 


’ Cf. reference [1], p. 81 



160 


II. v. MIKES 


bacteria per 10 cc). Rut, if nothing were known about (he initial distribution 
P(0), we could assume P(0) in the form 

P(0) = T, p(0) ^ mB n ~' for 0 ^ 8 S 1, 

with a largo value of m. With n — 5, x ~ 0 equations (2) and (3) give P o {0|) ~ 
0.60 for m = 10, and Po(0i) — 0.88 if m is 5. These values are much too low 
to justify any recommendation of a water supply for which r was found hi ha 
zero. Thus wc have to ask: What is the real source of the confidence we put in 
the inference from * = 0 upon $ ^ Qi ? 

There is no doubt, that this confidence is based on previous experience We 
know that the water supplies subjected to the routine test in the past formed a 
class of rather clean than dirty water and we rely that a new sample will belong 
to the same class. The author was given the following information about the 
results under the jurisdiction of Massachusetts during the hist decade. Out 
of a total of N = 3420 examinations there were found 

3086 cases with £ = 0 (no positive teat) 

279 cases with x => 1 (one positive test) 

32 cases with x = 2 

15 cases with x — 3 

6 cases with x ** 4 

3 cases with x = 5 

The overwhelming majority of cases with x *= 0 is evident. The question is 
only how we can use these statistics of past experiments for obtaining a nu¬ 
merical inference upon the value of P x (6). 

If the initial distribution P(0) were known, wc could find the probability Q, 
of getting x positive tests out of n; 

(10) Q. - jf P(* I 8) dP(6 ) = Q jf 0*(1 - 0)"“* dPW. 

Using the numbers Ni , N*, N u introduced in section 2 the probability Q{x) 
is defined by equation (6). 

If the number N of past examinations is considered as sufficiently large, wo 
can take the ratios 3086/3420, 279/3420 etc. as approximate values for , 
Q\ etc. Now, according to the well-known identities 

(11) ;S *0'“-«“-*• 

iFij s 1(1 -»(;)«■» - *)■- - 


(12) 



BAYES* FORMULA 


161 


and using (10) we can derive from the values Qa ,Qi, ■ ■ ■ ,Q n the first and second 
moments of the distribution function P(9). 


(13) 


Mt = f 1 e dP(e) =- it Qt 

71 loaO 

Mu = f e 1 dP(e) = 

Jo 


X) *(* - 1)0* . 

n[n — 1) 


If we introduce here the above mentioned empirical ratios for Q* we find the 
approximate values for the first and second moments of P(9): 


(130 


Mi = 0.02174 Mu = 0.00401. 


6. Determination of a distribution function by its first moments. In an 
earlier paper the author showed [3] how the exact upper and lower bounds for a 
distribution function P(0) can be found, if the expected values of two functions 
f(9) and g(9) are known. The only condition was that the curve represented 
in a Cartesian coordinate system by x = f(6), y = g(9) is convex. Let us take 

J{9) = g(9) = 0 for 9 < 0 

(14) f(9) = 6, g(9) = 9' for 0 ^ 6 g 1 

f(g) = g (e) = 1 for 6 > 1. 

In this case the condition is fulfilled and the expected values of f(9) and g(9) 
are the moments M x , Mu, respectively. The results obtained in the paper 
quoted above take the following form: 

First, we have to derive from the given values M t and Mu two points 9' and 
9" of the internal 0 g 9 ^ 1 


(15) 


d , = Mi - Mu 


1 - Mi ’ 


an _ Mu 

’ ~sr,- 


Then the limits for P{9) are: 


0 g F(0) g 


Mu - Ml 
Mu - 2il/i 9 + e- 


(16) i - m x - ¥j JI* g p(e) g i - j— s 

V v — 1 


(Ml - 0) 2 


Mu - 2AM + 0 2 


g P(B) g 1 


for 0 g 6 <> 9’ 
for 0' ^ 0 g 6" 
for 6" Z 9 £ 1. 


In our case we find 9' = 0.0213, 9" = 0.1619 and the point 0i = 0.6321 falls 
into the third interval 9", 1. The lines O A B C and O D E F G in Fig. 1 show, 
(slightly distorted) the lower and upper bounds foi P(9). 



162 


B. V, MI8E8 


P(J*1 



0/0* 

aftM 


xzr* 


J 


9 


*-T7* 


1 <f~—x 


Fin. 2 

Fia. 1. Tha limits of the overall distribution function 
Fia. 2. The 99% region in the methods of confidence intervals 


7. Application to Bayes’ formula. The inequalities (16) enable us to find in a 
simple way a lower bound for the end probability P*(0 t ) defined by (2) and (3) 
in the case x = 0. Let us denote by A the numeratoh in (3) and by B the 
supplementary integral 

(17) B s» f p(x| 9) dP(8), 

so as to have A + B for the denominator in (3). If the subscripts min and max 
denote a lower and upper bound respectively we can write 


(18) 


PM) 




A + B d B l, -f Brn»x' 

Now, taking x = 0 we find by product integration 

(19) a = P(e,)( l - eO" + n P(f»(i ~ a)"- 1 de. 

JO 


Therefore, A min is found when we introduce in this expression the lower lim 
for P(9) as given in (16). If we do this and use the values for M, and M t 
according to (13'), numerical computation leads to A mln » 0.712. 

In the same way we obtain B in the form 


( 2 °) P = - P(6 ,)(1 -ed n + n f P(9)(l - 9)’’"' de. 

The upper bound P m „ ia reached, if we introduce in the integral P( 0 ) = 1 and 
in the first term the minimum value for P(fc) following from (16), The second 




BAYES' FORMULA 


163 


term becomes thus equal to (1 — 8,)" and the numerical result is B miX = 
0.0000607. Therefore the inequality (18) supplies 

OS') PM i - 0.99915. 

The final outcome secured m this way can be formulated as follows: If we 
assume that in continuing the experiments the distribution of test results will be 
about the same as it has been in the past 3420 cases, we have a chance of more than 
99.9% of being right, when we state m each case of no positive test that the density 
of baclerias is less than 1 per 10 ccm. 

The high value of 99 9% for P{8f) is of course strictly bound to the assump¬ 
tion that the entire mass of water supplies to be tested is homogeneous and 
sufficiently characterized by the distribution of test results found in the past. 
If e.g we had to assume that the six possible values for x (0 to 5) in the long 
mn appear with equal fiequencies so as to have Q 0 = Qi = ■ • ■ Qt = £, the 
same method would give Mi = Mi = then d' = + 6" = f, and the final 
result would be Po(0i) ^ 0.73. The assumption of a constant initial density 
P(0) = 6 would give Po(0i) = Po(0i) = 0.9975, a little less than the value 
found above in (18'). 


8. The case x = 1. The results are less favorable in the case of one positive 
test, x — 1. Here we have 

(21) p(l | 8) = nfl(l - 0) n_1 = 50(1 - 0)\ 

and the derivative of p is first positive, then negative. We can conclude from 
Tig. 1 that the minimum value for A and the maximum for B will be reached 
when the distribution function P(0) is represented by the line 0 D I H J G 
where III is horizontal and H the point on B C with abscissa 6i . The abscissa 
0 O of I is determined by the equation 

/ nr) 5 Mi — M\ __ {M\ — 0i) 2 

k ' Mi - 2Mido + 61 Mi - 2Mi 01 + 05’ 


which supplies 0 q = 0.0190. We then have 


(23) 



P(11 e) dP(B), 


with the value p{ 1 | 0) from (21) and with 


P{6) = 


Mi - M\ 

Mi - 2Mi8 + 0 s 


according to (16). On the other hand is found, as in the former case, to be 
(24) •^miu — pd 10i)U - Pied ], 



1G4 


II. V. MISKS 


where we have to take for Pith) its minimum value according to (16). The 
numerical computation yields * (l.ofir/2 ami B.,, xx - (1 00052 w» as to give 

k- „!?„»n«2. 

Ut .1 

The result, is that under the assumption above mentioned ur have nmrr 
than 92% chance of being right, if we predict each time one out of five tests luus 
been positive that the density of bacilli is less than 1 per 10 com The chance 
computed under the assumption of a uniform initial distribution P(0) n 0 
would be 0.97. 

9. The method of confidence intervals. One may ask what kind of answer 
to our questions can he deduced from the principle of confidence intervals. 
This method has undeniably to its credit that no use is made lie re of the initial 
distribution P(0) and that, therefore, all its statements aie completely inde¬ 
pendent of what is assumed about Pig). 

In order to apply this method 4 we have to select for a given degree of confi¬ 
dence, say a - 0.99, a region of acceptance, i,e. an area in the two dimensional 
x, 0 plane limited by tuo lines x,(0) and x 5 (0) so as to have for each 0 

( 25 ) ITob |.ri(0) £ x & jr»(0)} «■ a. 

The region is, of course, not uniquely determined by (25). In our ease, how¬ 
ever, one will generally agree that the best way to determine the region consists 
in assuming for xi(0) and Xj(0) two step lines with steps at the integer values 

x = 0, 1, 2, • • • as indicated in Fig. 2, Then the formula (2) for p(x 1 0) com¬ 

bined with (25) supplies the abscissae of the steps, if x is given. If we transform 
the limits for 0 into limits for X using equation (1), the final outcome reads as 
follows: 

Whatever the initial distribution P(0) may be, ice have, a chance of 90% of being 
right, if we predict : 

each time x = 0 is observed that X lies between 0 and 0.02, 

each time x ~ I is observed that X lies between 0,002 ami 1.51, 

each time x = 2 is observed that X lies between 0.036 and 2.24, 

each time x = 3 is observed that X lies between 0.112 and 3,41, 

each time x = 4 is observed that X lies between 0.25 arul 8.48, 
each time x = 5 is observed that. X lies between 0.51 and *. 

It is true that in this way we obtain a result independent of any assumption 
on P{6). But it is essential that the chance of « - 99% holds only for the. six 
joint statements as a whole. This moans it may happen that for instance the 
list assertion (that X is smaller than 0,92 in the case x = 0) is correct hut very 
seldom or even nev er, while other assertions (e.g. those for x - 4 and 5) have 

* Cf reference [51 anil reference [4], p. 203. 



165 


BAYES 1 FORMULA 

a much greater chance than 99% of being correct. Whether this happens or 
not depends on the initial distribution P(l ). As long as we know nothing about 
P(0) we are not in the position to conclude, by using the method of confidence 
intervals, that the particular statement i 0.92 if x = 0” has a chance of 
99% or even any chance at all of being coirect, On the other hand, when x = 0 
has been observed we are in no way interested in consequences that may be 
drawn in the case x = 4 or x = 5 or in a set of statements that includes the 
cases x = 4 and % = 5, The only practical question that is relevant to the 
purpose for which the tests are made is this. Wkf can we conclude from ihe fad 
that in a certain instance x = 0 has been observed (or in another instance x = 1)? 
It seems that the method of confidence intervals, discarding any consideration 
of the initial distribution, can supply no contribution towards the answering 
this particular question . 


[1] It v Mires, "Fundamentalsaetze der Wahrschcinliclikeitsreclinung," Math, Zeit,, 

Vol, 4 (1919), pp. 1-97 

[2] It, v Mises, Wahrschemhchkilsrechnung und ike Anwendung m der Slalulik an d 

Theorelwhen Fhysik, Leipzig und Wien, 1931 

[3] R. v. Mises, "The limits of a distribution function If two expected values are given," 

Armais of Math, Mat,, Vol, 10 (1939), pp 99-101 
[4| R, v, Mibes, "On the foundation of probability and statistics," Annals of Math, Slat 
Vol, 12 (1941), pp, 191-205, 

[5] J Nbymau, Roy Slat Soc, Jour,, Vol 97 (1934), pp 590-592 



AN ITERATIVE METHOD OF ADJUSTING SAMPLE FREQUENCY TABLES 
WHEN EXPECTED MARGINAL TOTALS ARE KNOWN 

By Fnimutirk F. Stimim: 

Cornell Vturmihj and f\ S, C<n«u* liurrnu 

I. Introduction. In a previous paper by W. KdvunL i immiK and the 
author [1] the method of least .squares was applied to the adjustment of simple 
frequency tables for which the exported values of (lit 1 marginal total** are known. 
From observations on a sample the fiequenrir- for tlie cell m the /lh row 
and jth column of a two dimensional table and the r row and column total- 1 , 
n, and n,,, are obtained. These frequence- are subject to the mror- of vaudum 
sampling and it is desired to adjust them so that the row and column totals 
will agree with their expected values, m, and in ,, which sue known. The 
adjustment involves the, solution of tin* r l-s- 1 normal equation*-. 

/ •* 1, 2, • • • , r 

.1 ----- 1 . 2 , 1 

where the X are Lagrange, multipliers from which are calculated tin* ndju-ted 
frequencies 

(2) w.j - n„( 1 + X, I X,). 

Similar equations arise in the three dimensional ease. 

A method of iterative proportions was presented for effecting the adjustments 
more conveniently than by solving the normal and condition equations, and it 
was stated that "the final results coincide with the least square* solution,” 
This statement is incorrect, for although the adjusted values sat isfy the condition 
equations, they do not satisfy the normal equations and hence they provide 
only an approximation to the solutions The method of iterative proportions ha#t 
several interesting characteristics that will be discussed in a later section. 
This paper now presents a method that converges to the values given by (he 
least squares adjustment and is self correcting. It can be used with any set of 
data and weights for which a least squares solution exists. The two-dimensional 
case will be considered first. 

2. The two-dimensional case; expected row and column totals known. 
Assume that a sample of n items is drawn at random and omKa-chmsificd in a 
table of r rows and s columns, As in the previous paper, let n,, be the frequency 
in the ith row and jtli column of the two-way frequency distribution. Indicate 
summation by substituting a dot for the letter over which the summation is to 
be performed. Then n { _ and n., are the marginal totals for the ffch row and 
jth column respectively. Let m,*. and m, } be the expected values of these 

366 


( 1 ) 


n,*,X., + X ni,X., « m,, -- , 

J 

X tti'jX,. + n.,X, = MI./ - it,, 



ADJUSTING SAMPLE FREQUENCY TABLES 


167 


mart ,inal totals calculated from other information or from theoretical considera¬ 
tions, and c ,, a set of constants known or estimated to be proportional to the 
recipiocals of the weights of the n,,, i e proportional to their error variances. 
Since the weights are positive, the c t j are non-negative and finite. It is assumed 
that the set of weights is such that for the given data an adjustment exists. 

The least squares adjusted frequencies m ,, can be computed from the given 
numbers c,,, n tJ , . , and m,j by a series of approximate adjustments in a 

manner now to be explained Let be the pth approximation to rtii, . In 
conformity with this notation m.'f = . Let 


(3) . = m», 


( p) 

mir, 


d[ p) = m, — 


„(p) 
m { , 


d\ p) = m,, — mV 


(p) 


be corrections that must be added to the m' v) to produce the least Squares 
adjusted frequencies. As d —> 0, m (p) —> m. Let X, (p> and \\ p) be constants 
determined arbitrarily between the limits set by equations (5) to (7). Any one 
X may be fixed arbitrarily and kept constant through successive approximations. 
Note that X[V = X ( , 0) = 0 and that, if at every step we set X. t , !>) = 0, the X (p> 
are approximations to the Lagrange multipliers in the normal equations. After 
p steps in the iterative process the approximate adjusted frequencies will be 

(4) = n,, + + X (p) ). 


The following conditions, derived from (19), (23), and (24), are sufficient to 
make the successive approximations converge to the least squares adjusted 
frequencies: 

X,?’ = X ^- 11 + d[ p) d\ v ~ l) /ci ., 

(5) xv p) = x. ( r i) + e (p) d. c r i) /c.,, 

(6) o < e[ v \ o < 9j p> + d\ p) < 2 ,- 


and, for at Least one pair ij, 

(7) e\?\d\ry + e\ p \d\ p -y > 0; s'* 5 + 6 {p) < 2. 

The 6’s are introduced because in actual computations the successive approxi¬ 
mations X (p) can only be calculated to a limited number of digits and because 
the adjustment may progress more rapidly if the computer is permitted to use 
his judgment in determining the approximations as he observes the "course of 
previous approximations. 

The process of adjustment is continued until the d\ p) and d {p) becoihe suffi¬ 
ciently small to provide the desired degree of agreement between the adjusted 
and expected row and column totals. 


3. Example. The following example shows the steps in the adjustment for 
a table of 3 rows and 4 columns with 6[? — Q {p) = 1: 



168 


VIUl&MWK V. STKI'HW 


u i 

1 

m, 

»i i ’ 

0 



It) 4 

Pt 4 

. i' 

J < .a 

.j 

8 

s > 

A*' 

* t 

■ 


i 

_ I 

>J 

(2) ; 

ih 

hi < 

t5i 

7 

,* 


Ii 

n; 


U.'i 

ii 

733! 

_ 

! 

75 

- 

777 5 



772 *' 



771 

12 

742G' 


- - j 

455 

- 

7505 0 



71% .1 



7197 

13 

4700; 

i 

1 


358, 

— ! 

4712 0 



17(C) r, 



1711 

14 

2145 

_ 1 
i 

, 

170 

1 

2053 8 



2u'tl ,1 



2019 

21 

517 

j 

- t 

52 


528 9 


, 

5 



52*1 

22 

1)23 

1 

— , 

95 

— ■ 

D73 r 



‘<78, .1 



979 

23 

022 

_ 1 

— | 

56 

; 

630.5 



OI3 5 



Oil 

24 

703 

t 

- | 

— j 

7(1 

-- - 

688.7 



fi'H 9 



692 

31 

1 

207 

~ 1 

\ 

“ « 

in: 

— ' 

200 3 • 



201 1 



201 

32 

373 

1 ' 

1 

-1 

38 

— * 

309 1 



372 3 



373 

33 

337 

| ~ 


31 

— 

328.7; • 

- 


331 7 



332 

34 

125 


i 

39 


391 5 - 



397 5 



3*17 

1 

1507 

1 

15011 

-6; 

140, 

» 

- 041 

1506.7 ~5 7 

03! Kt 

■■ 1)800 

l.'jo.l 5 


5 

1/4)1 

.2 

8727 

881!) 

+122! 

588,+ .208, 

SH1H 1 4(1 !) 

4 0015 

f 2095 

HH|). 9 

* » 

1 

VU9 

3 

5608 

5087 

+ 1(1 1 

415 

+ 04 3 1 

5080 5: 4 0 2 

t 0139 

■ 1)509 

5*l8t J 


0 

5087 

,4 

3273 

3138 

-135' 

1 

285; 

1 

- 474; 

3139 0 -1 0 

- 0035 

- 4775 

.11 (O 7 

"2 

7 

3138 

1. 

15003 

; 15028 

1 —351 1064, 

- 033 

15051 5.-23 5 

- ,11221 

- 0551 

15+10 1 

*> 

1 

15028 

0 

2770 

1 2844 

+74 j 

273| 

+ .27' 

2830 5 +13 5 

+ (HU 5 

l - 3195 

2812 8 

-1 

.( 

*r 

28)1 

3. 

1342 

! 1303 

-30 

mj 

- 31 ! 

1292,0 +10 4 

1 UH19 

- 2281 

13* (2 0 

* 0 

1 

HU))! 


1 19171 

I 

) 19175 

0 

1 

1464 

; „ ! 

19174 0, +0.1, 

— 

... 

19)75 5 

- 0 

5 

19175 


Columns (1), (2) and (4) arc given. Columns (3) and 101 to Mil arc calcu¬ 
lated in succession using equations (3), (-1), and (5), It is not necessary in 
practice to record the fl's or even determine their values since the X^' may he 
determined directly at convenient values approximately equal to their corre¬ 
sponding + dl p ~ n /Ci. and X',"" 1 ’ 4- el', p "' i> /c. i . The final adjusted fre¬ 
quencies given in column (12) are derived hy another repetition of the adjust¬ 
ment process but the amounts involved are so small that they can lm calculated 
mentally and the n<> rounded at the same time. 

4, Computing procedure. The computing procedure may Ik* vt up in any 
of a number of ways to meet the preferences of the computer and tin* charac¬ 
teristics of the problem. Ordinarily it is desirable to make every number 
positive ancl the procedure as nearly routine as possible. 

For two-dimensional adjustments the following procedure of computing alter¬ 
nately by columns and by rows is convenient: 

(a) Set up a table of the c<j In r rows and $ columns. Enter the c,. in the 
s 4- 1 column, the c,, in the r + 1 row, and c,. =* £ c tl « £ <*,, in the com¬ 
mon cell. 



ADJUSTING SAMPLE FREQUENCY TABLES 


169 


(b) Calculate the quantities A , = (d ( , 0) /c,) -f a and A., = (d?/c.,) + a 
and enter them in the s + 2 column and r + 2 row. The constant a is selected 
at some value that yvill make all quantities in the computations positive and 
may be any convenient integer greater than 2 max | d, C0) /c,. | or 2 max | dl?/c., \. 

(c) Calculate the factors g,'. 1 ' approximately equal to the A % — \a and enter 
each on its corresponding row in the s + 3 column. Throughout the computa¬ 
tions the n v are merely X, + §a. 

(d) Take column j and multiply each c,, by its corresponding nl u accumu¬ 
lating the products in the calculating machine. Divide the sum of products 
by c.j , subtract the quotient from A } , and record the difference p, ( , 2) in thejth 
column on the r + 3 row. Repeat for each of the other columns. 

(e) Take row i and multiply each c,yby its corresponding p f accumulating 
the products in the calculating machine. Divide the sum of products by Cf., 
subtract the quotient from A,., and record the difference nl 3) on the ith row 
in the s + 4 column bordering the table on the right. Repeat for each of the 
other rows. 

(f) Repeat steps (d) and (e) alternately until a satisfactory degree of stability 
is reached in the p,, and /x.,. Then compute each adjusted frequency as follows: 

(8) m[? = + M.V 5 - a) + n tj , 

taking either nl v) — or n\ p) = a 8 the case may be. 

(g) The computations may be checked at any step by computing 


(9) 

it 

■n 

3-~ 

w- 

— X m1. p 11 c,. = ac. 

1 

- X Mir 
% 

or 




( 10 ) 

X fit? c.. = X Ai. d. 

i 1 

- X^’c, = ac.. 

3 

- XM?r 

1 


(h) At any step a constant may be added to all the nl? and subtracted from 
all the p., ; this may be necessary to keep the n’s all positive. It has no effect 
on the value of a to be used in (8). 

(i) If it is desired to “inflate” the adjusted frequencies (X m,y ^ X m,) 
first multiply each n,y, n; , and n , by the factor X wii;/X i and then proceed 

* 13 

aB above using the products in place of their corresponding , n,, and nj . 

(j) If before the iterative process has reached an acceptable adjustment it is 
desired to force a satisfaction of the condition equations, compute: 

(11) ’ = afol* + a?’ - o) + n,j + (dl v) c, + d^cj/c .., 

in which either the d[ p) or the d\ p) are all zero. 

5. Adjustments in three dimensions. If the sample is cross-tabulated in a 
three-way frequency distribution, there are two cases that are not reducible to 



170 


FKEDKIUOK E, KTl.MUN 


two-way distributions. The.ie are designated Case III ami (’asp \ II in flip 
earlier paper (11- The adjustment equations art-, ic-peetively, 

m# - h »jk 4- + \ r , pi 1- X f'j 

(12) mfn 1 = n„, + p^air* + x:r + x.,n, 

subject to conditions on the choice of the X corresponding tn equations <?>), Ill), 
and (7), For Case III, the conditions are that 

(13) 0 < 0\ p) , 0 < 6 lp) , 0 < o[V , g[ p) 4 Of* I- 0‘V < 2, 

and for at least one triple, ijk, 0\ p) {d\ p ; "j 1 + ">* 4* QVul‘1 ")* > 

0 and 0, lp) 4- o\ p) 4- 0.a P) < 2. Similar conditions apply tn Case VII. 

The computing procedure described in Section 4 cun be. extended readily to 
the three-dimensional case. For example, in Case VII calculate »[]' approxi¬ 
mately equal to (d[°!/c>/) + and pf* approximately equal to (>i;V ,V, *) 4 - V*. 
Then multiply each c,,* in the column ]k by its corresponding f- ^;V) 
accumulating the products in the calculating machine. Divide the sum of the 
products by c and subtract the quotient from (</ v ,V/o.iO 4- a. Hecortl the 

difference as am! and repeat the, process for every other jk colunm. Take 

m!” ~ ail' and repeat for each ik column to obtain mV ; then take v - ju?* 
and repeat for each ij column to obtain m!? and so cm. The final adjusted 
frequencies are 

(14) = jw,k + +- ma 4- h!iS' - a). 

6. The general case. The iterative method cun be extended readily to 
more than three dimensions and to various systems of condition equations, A 
simple general notation may now be introduced. Let the cells be munbcicd in 
any order from 1 to l and for the ith cell let ft, be, the value given by tin 1 sample, 
a a finite positive constant known or estimated to ho inversely proportional to 
the weight of n,, m { the least squares adjusted value to be determined, m*" 1 
the pth approximation to r ft,-, d[ 0 = ?n,- - m, (p \ and m' 0> ® n,. Assume that 
the values m, of certain linear combinations of the m,' are given, i.e. there is a 
system of consistent linear equations of condition numbered in any order, the 
o-th equation being 

'(!5) 52 h„m, = m., 22 tij, > 0, 

» » 

ii,v and m, being known a priori. The corresponding linear combinations of the 
ft; and d\ v) define 

( 16 ) n 4 = 52 b u «,, di v) = 22 b„d\ p *. 


Let 

(17) 


c* = 22 b? r Ci. 



ADJUSTING SAMPLE FREQUENCY TABLES 


171 


The pth approximation to m, is 

(18) m[ v) = Ux + c, £ 

a 

where 

( 19 ) ^ + tPdHr^/C', x' 0) = 0 , 

the 8 { /\ and therefore the X, p \ being arbitrary for a finite number of steps, 
say p', but determined thereafter so that 

(20) 2 £ 6l v) (dl p ~y/C' - E c*(£ b„0< p) di^/c,) 2 > {&-")*/(fi, 11), 

<r iff 

r being a value of cr, chosen at the pth step, for which (d^ - 1 ) ) 2 /c„ is a maximum 
and H a finite number greater than 1 fixed prior to the first step as large as one 
will. That this condition can be satisfied may be shown by putting 0 { T r) — 

1 and Q[ v) = 0 (<7 * r). 

A weighted average of several of the possible selections of ol P) satisfying (20) 
will also satisfy (20), positive "weights” being assumed Let k added to the 
superscript represent the /cth such selection and let a {v,k) > 0 be a constant for 
"weighting” the /cth selection in the weighted average which may be chosen 
arbitrarily except that E « <P ' W = 1- Then, if the fcth selection of f) ( / } is repre¬ 
sented by 6' P ' k) , the weighted averages are P^ p ’ 0, = E a p ' k) 6 ( /' k) , Substitute 

k 

them in the left-hand side of ( 20 ), 

2 Eo^id^y/c, - Ec,(Eb„8 < /' m cll l ’- ,) /c a y 

<r i <j 

(21) =2EE^ei p ' k \dl v -y/c,-Ec t {EEb,cc t (p ' k) e^ k) di v ' l) /c r y 

tr k i ok 

= E a (p,4) (2 E Q* v,k} (^ P_1) ) 2 /c.) - £c,(£« (M) EM! f ' ll 4 H )/c.) ! , 

k a \ k a 

which by the Cauchy-Schwarz inequality 

> £a (p ' fc, (2 E0* P ' k) (d< p -”) 2 /Cr) 

k o 

~ E c.(£ a (p '*>) {£ a^(E h' Q[ v ' k) d { r 1) /c,) 2 ) 

\ k k tf 

= E^ k) {2Eo l /' k) (d^) 2 /c, - £c,(£t w e' w d‘'’- 1 Vc.) i ! 

k a i o 

> E^ M (d l ry/(c T H) = (d ( r"y/(crii). 

k 

A simpler and more convenient but somewhat more restrictive condition may 
be derived as a special case of (20). Let 0, pi = 0 except for a set of one or 



172 


FIlEDKHK'K f stkphan 


more a so selected that - 0 for every i and every pair a' and a" in the 

set. Then (20) becomes 

(22) E W - (d'/^Y/r, > Ulr*~' > Y/(c,H). 

Differentiating partially for a maximum with respect to one of the $l*\ we find 
that this special case of the condition will be satisfied if for one ct in the set, 
say 7T, such that 

(23) (d c /" n )Vcr > (d? i y/(c,vrn, 

the value of fl ( r p> is chosen in the range, 

(24) 1/(2VS) < 0Y ] < 2 - 1/(2y77) 
and for every other a in the set 

(25) 0 < 8l p) < 2, 

all 8 i / ) not in the set being zero. A weighted average of such values of 0 will 
satisfy (20) whence (8) and (7) follow. 

In practice values of 0l P) satisfying (20) may be selected conveniently by the 
following procedure' 

(a) Select a set of a for at least one of which 0 fp) satisfies (23) and for every 

pair of which bi„'b lf n = 0. In so far as this restriction permits choose the <r 
corresponding to the larger values of /c ,. 

(b) Determine values for each $i p) in the set approximately equal to 1. 
Until other values are assigned to them assume all other 0 {p) =* Q, 

(c) Choose a a not in the set, say p, for which (d l /~ 1> ) a /c l , is relatively largo 
and select a value for such that 

(26) 0'<’> * {-EE <nK 8- B^d^/c .} fd\T" . 

i •/*/> 

(d) Having changed 0[ p) from 0 to a value approximately satisfying (2G), 
continue with other a not in the set letting p in (20) represent each in turn. 
The work may be terminated at any Btage leaving some 0 1 / 1 =» 0. 

7 Convergence of the adjustment. The condition equations may be written 
in the following form 

(27) E^4 0> - d?\ 

t 

as a system of consistent, but not necessarily independent, linear equations. 
They may also bo written as conditions on the m<. The least squares adjust¬ 
ment minimizes the quadratic form 

(28) S (Q) = E (dW 'ft 



ADJUSTING SAMPLE FREQUENCY TABLES 


173 


subject to the restraints (27) Since the c t are positive, S (0) is positive definite, 
and therefore a minimum exists and is non-negative. The values of the d[ 0) 
that minimize S m -while satisfying (27) are m, — n, , the n, being known and 
the m, being the least squares adjusted values that are to be calculated. 

If r is the rank of the matrix || |j, then from (15) and (1G) it follows that 

r of the d[ a> may be expressed as linear functions of the i — r other d[ m . The 
latter then constitute a set of t — r independent variables. The normal 
equations 

(29) dS^/ddh 1 = 0, 

are obtained by differentiating S <0) with respect to each one of them in turn, 
one equation insulting for each value of h corresponding to a d, m the set of 
independent variables. The normal equations (29) are a system of t — r 
independent linear equations and can be written in the form 

(30) 2«, w d$ 0) = 

where the first summation is over the set of independent variables, and the 
second over the d'f' in the ? selected condition equations. The right-hand 
terms aic constants. Since a least squares adjustment exists the equations are 
consistent and the rank of the matrix j| «,(*) || is f — r. Any df ] in the set, 
say is the quotient of two determinants the divisor being the determinant 
| a,(h) | and the dividend being the determinant obtained by replacing the 
«>'(*> by L/3 o(M dl m . Consequently each d\ 0) whether in the set or not is a 

linear combination of the dj 0) and the sum of the absolute values of the coeffi¬ 
cients of the dj Q) is finite. Therefore 

(31) max | d\ a) /s/c, \ < G max | d^’/VZ | 

where G is (max c„/min c,)* times the sum of the absolute values of the coef¬ 
ficients of the dj 0> in the linear combination for which such sum is a maximum. 
Fiom (28) 

(32) S w < t max ((dff/c.l < G'H max {(d‘ M )7c») 
whence 

(33) (d' 0) )7cr ^ S^/iGh). 

Consider now the discrepancies 

(34) d\» = m t - m\ p) = - c. Z b„ Ql p) di^/c. 

9 

between the and the corresponding approximations m\ T) and write the 
quadratic form 

(35) 


S M = Z (d\ p) T/c.. 



m 


FltKDKHIf'K K. sn i'HAN 


Fimu (10), (18), and (81) 

(30) if," * rfl*" -I■ r. £/>,»,X)"', 


and 

(371 


<C ~dt»' -1 -ElWnr.r. 

I u 


lionet! tlie substitution of (30) in (27) merely changes (<l) (o (;i) in the super¬ 
scripts, the new equations being; consistent, by dotiuitinii and I lie eon (•‘.ponding r 
of the cl[ v) being expressible ns linear functions of the other t r, Further 
(35) is positive definite and hence, has a minimum, in fact substituting (3d) in 
(28) we find that 


(38) 


dS (0> = rl iS' 1111 
dd[ 0> dd [ p1 


Mi 1 ' 1 


Ml* 


+ Er.&M'STt 


(,s () " 4-22^" xi 1 " 


fits' 

Ml’" 


- 0. 


Hence a least squares solution for tht» r/*'” existsamlit leads by (31) to (lie same 
values for the m, as does the solution for the <i[ u \ Since the coefficients o,m 
and P„(h) and the number G are functions of the f/„ and e, they are invariant 
for the substitution. Consequently (30), (31). (32), and (33) may also be 
written with (p) in place of (0) in tin* superscript-. (33) becoming 

(39) (4"’)7r r > ,^/(G‘l), 


From (20), (34), and (35) 

S w = D (dl'-r/e, 

l 

= Z WS"~ U )7ri - 2 £ Z 

‘ 1 * 

wo) + Zcf(Zw 0 ^" , rfO , ~ , 7c,) J 

t <r 

= , S t "“ 1 ’ ~ 2 + z r, (Z b„6 l /'d l S~'';c,) 3 

9 tfl ' 

< S ( "- u - (dj p - 1, )7(c r //), p > P' 

and from (39) 

(41) S w < f 8 (p - ,, - J S' (,, ~ l 7M < .S ,(p ' , il - i/A/| »* “ p ' 

where 


( 42 ) M = (? 7 //t. 

Therefore, asp—soo t p — p'—»<», ,S W —- 0, —+ 0, tn ( , p! —+ m, and conse¬ 

quently the successive adjusted frequencies obtained by an. iterative process in 



ADJUSTING SAMPLE FREQUENCY TVnL.ES 


175 


which condition (20) is satisfied converge to the; adjusted frequencies that are 
obtained by solving the normal equations 

8. Rate of convergence. The eompulev is not as much interested in the 
pioof of convergence as lie is in how rapidly the successive adjustments reach a 
satisfactory degree of approximation. Equations (3!)) or (41) are of no help 
to him. The adjustment may he made m one step, with every 0 = 1, (a) if 
the condition equations aie such that cveiy = 0, o' ^ a", i.c if the 

adjustment can he separated into one-dimensional cases when redundant condi¬ 
tion equations arc ignored, or (b), in the two and three-dimensional cases, if 
the c (J or c tJ i are proportional to the e, and e , oi to the c, , c , , c,x or c,, , 
r, jl , and c } k respectively. Except in these and othei special cases the rapidity 
of convergence depends on the d) 0> as well as on the || j| matnx However, 
it seems that one can make very little use of the rfj 01 to determine the rapidity 
of convergence without actually computing the successn e adjustments or making 
some equivalent calculation. 

Certain results can bo obtained from the j| h,„c, || matiix alone Returning 
to the two-dimensional ease and its notation, consider the matrix || e,,jj and 
define, 

(43) S.j = c,', Ci c ,/c. , c . = Z c i 

Let the adjustments he made with the restriction that 0\ I> ' 1 = 0 and fl'/’ 1 = 1 
when p is even, and &\ v) = 1 and = 0 when p is odd. Then .if p > 1 

d\ p) = -z (eje Mr u = z z (eje ,)(cjcj 

(44) _ L ' ' 

= E Z iUc ,) (5 M dj*-' (/ = 1, 2, ■ • • , r) 

I ! 

The sum of the absolute values is 

(45) E I dl p) I < bl EI | < hr ' 1 E I d l 1 ' | 

1 1 1 

where 

(40) b\ = E E I Sij/c., | 7 , 

I ) 

y,j being the greatest of the | 5,,/c,. | in the jth column. Similarly for p > 2 

(47) EI A p) | < hi EI dV' 3> I < w 2 EI d 1 ,” I 

J J i 

where 

(48) bl = E E I $.j/c». 17.. 

* J 

7 ,, being the greatest of the | 8,,/cy | in the ith column. 



170 


FREDERICK F. STEPHAN 


Assume again the conditions just preceding (44). Let u,. he the minimum 
c,,/ftj in the rth row. Likewise let v., he the minimum in the jth column. 
Then since Sd{ ?> = 2d l , p) = 0, 

(49) 2 | d? j = 2SV/' = - 22 dl?\ 


the -J- and — signs indicating that the last two summations are over positive 
and negative values of d l , r) respectively. When p is even, of course, all values 
of d.V 1 = 0. 

From (44) 

(50) d™ = -Sc,7d. ( r 1 7c,^ ” 1/r,, 

i j ; 

= E ft, I d.V“ u l/c.j - 2 E + tit I A”" 1 ’ i/r I 

i 

= 2 E~ c., I I/ft, - E ft, I d? i/ft i 

) f 

and by (49) 

(51) I d S'” |< Eft, I d!r U I/ft, ~ E I <*.? ls! 

l ) 

EI d'.f | S EI d. ( r n I (i - E »..)• 


J * 


(52) 

Similarly 

(53) E|d^l<E|d|r n |(l~ E«'f) 

J * J 

Let bi ~ 1 — E w,. and Im = 1 ~ E ftr > then 

< j 

(54) E I | < bib, E I d^*’ ! < (M,) 1 *" 1 E I d‘!» I- 

< , t 

Now 6j or hi may be greater or less than bi or fcj but, unlike b t and h a , they 
can not exceed unity. Let 6 5 be the lesser of b[ and Wn. Then under the 
conditions stated with equation (44) 

(55) 2 | d ( , p+l> | < 2 | d\ p) | < 6*2 | d\T* | < 6 j, "*2 \ d'*> | < 2 [ d?, 1 ’ |. 

It follows from (40) that 

s (,,) - s (p+l> + E (d < ,. p, )Vft. + E Wf/')Vc./ 


( 56 ) 


-£ (E (dif’)Vft. + E (dJf’)Vft/} 

*~p < t 

^ E {(El DVmin«, + (E |df*’ |)*/minc,/} 

"-P » f 

< (E i di. pl | + | dJr" j) ! ((l/min «.) + (1/miac.,•)}/(! - 6*). 



ADJUSTING SAMPLE FREQUENCY TABLES 


177 


The reduction in S w in g steps of the iterative process is 


v+o—1 


(57) 


D = - S {p+o) - E E (d\ h) ) 2 / Cl + Z (d^f/c ,] 

A-P i 7 


P+J7-1 


> Z [(Z [ 1)7 7 max c,) + (Z I d.?’ |)7( s max c ,)]. 

A-p * 7 


from which, by (55), if <7 > 1 is odd, 


(58) £ > \ — ~ (Z I d[ p+0) | + | d[ p hff+1> |) a l —I— + —I—'j . 

l — o* , \rmaxc, s max c,,/ 


1 - b - 4 

The relative decrease in S ip) is, therefore, by (56), 


(59) 


D_ 

S<»> 


D 


D + £'”+<» 


> 1 + 


1 /min c, T 1 /mrn c ; 


b\h 


—417 


-«c- 


max c v 


+ 


s max c., 


If the g steps actually have been taken a better lowei limit for the relative 
decrease in S Cv) may be obtained by computing D from (57) and using (56) 
for S (p+a) , Similar equations can be written using 1> 2 . 

These results can be shown to lie valid for an adjustment in which 9[ p) = 
0.7 1 — 1 at the first and any of the subsequent steps They also can be ex¬ 
tended to the three-dimensional cases but not to thiee-dimcnsional adjustments 
with every 0 = 1 . 


9. Improvement resulting from the adjustment. The least squares adjust¬ 
ment eliminates a portion of the errors of sampling, i.e a portion of * 2 , from 
the set of frequencies estimated from the sample. In fact any adjustment that 
satisfies the condition equations does this. 

Let a be the error in the fth value given by the sample and s[ p) the error in 
the pth approximation to the least squares adjusted value. Then 

(60) 5 [ p) = ^ + c 1 Zl.x77 

a 

and 

(61) Z (sl'OVc, = Z ',/c. + 2 Z «) P) ~ Z c. (Z 6. x7’) 2 ■ 

1 I <r 1 <r 

* 

The complete adjustment makes vanish and therefore, since the last term is 
non-negative, < 2 «Vc,- except in the trivial case in which all d) 0) = 0. 

From (37) 

(62) Z (5 < v" , )Vc, = Z 4/c,- + Z K (<f ( , p> - d™). 

v I tf 

The last term may be computed readily at any stage in the iteration. If the 
sampling is at landom, k Se 2 /Ci is distributed approximately as x with t — 1 
degrees of freedom, where k is the ratio of the c, to the corresponding error 



178 


Hii.nuin k i. -iM’in\ 


variances of the n,. Therefoie it would Mini .ippr«piui«- to eniupmt* 
k 2 K>C\ the reduction in \\ a> a mea-mie of tin- iiuptoveun nt nehn ud in 
the final adjustment. 

10. The method of iterative proportions, The it endue ne I h*nl 

described in the earlier paper [1] implicitly define**, in dm fun dimmi-mmi! e;ee, 

(03) ■- m h . 

the g,. and g , being given by the r *|- .*? condition equations 

(64) m,. *■ Zg. /*/««» «»!•* n j«im 

# i 

any r + s - 1 of which constitute a enn-iMcnt sy-lcm of independent equations 
in r + s unknowns. One multiplier, sty m , rmiv be fixed uibitt.udv, Then 
for a 2 X s tnbF it is necessary to solve an equation of the tth degree, If * 2, 

there is only one acceptable solution, given by the positive mot ; if ,s then* 
is only one solution of the cubic for which all the adjusted fteqUcneie*. aie noli- 
negativc, For 3 X 3 and larger tallies (he adjustment appeal * to involve the 
solution of equations of the tenth or higher degiet* and there F then no choice 
but to use. methods of approximation. 

The adjusted frequencies given by the method of iterative pi npoi turns are not 
identical to those given by the method of least squares. When the udjmt* 
meats are small relative to the frequencies adjusted, howevei, the results given 
by this method approximate those, of least squares. For the twu»dimen**miml 
case the successive adjustments converge to a set of frequencies that siUFfy flu; 
condition equations The author lias not found a proof of convergent e or 
divergence for more than two dimensions. 

I wish to express my appreciation of many stimulating conversation-* with 
Dr. W. Edwards Doming on this and related problems, and of the helpful 
critical reading of certain portions of the manuscript by Dr. Joseph F, Daly. 

rkfkhexct: 

(1] W Edwards Deming and Frederick F Sti.phax, "Ou ii least squares adpcumeiil of 
a sampled frequency table when Urn expected marginal totals an* kiimsii/' 
Annals of Math Slat , Vnl, 11 (It) 10), p 427, 



ESTIMATION OF VOLUME IN TIMBER STANDS BY STRIP SAMPLING 

By A. A. Hasel 

California Forest and Range Experiment Station 1 

i. Introduction. The present paper is the second of a proposed, series, in 
which it is intended to present a systematic study of the properties of several 
methods of sampling timber stands and statistical treatments of the samples 

The effects of size, shape, and arrangement of sampling units on the accuracy 
of sample estimates of timber stand volume were reported in the earlier paper [1] 
for 5,760 acres of the Blacks Mountain Experimental Forest. With complete 
inventory data, the nature of stand variation was shown to be such that 2.5-acre 
plots, the smallest size tested, were more efficient sampling units than larger 
plots, i.e,, for a given intensity of sampling the sampling error was smaller. 
Long, narrow plots were more efficient than square plots of the same size. 
Line-plot sampling units consisting of two or more equally spaced plots along 
lines of fixed length were as efficient as single-plot sampling units and more 
efficient than strips consisting of plots contiguous end to end. Improvement in 
the accuracy of estimates was obtained by subdividing the area into rectangular 
blocks of equal size, and sampling each block to the same intensity. By sys¬ 
tematic sampling, whereby the center lines of parallel line-plot or strip sampling 
units wore spaced equidistant, the sample estimates of stand volume were im¬ 
proved over estimates from comparable random samples. Treatment of the 
volumes on individual plots of systematic samples as random sampling observa¬ 
tions, however, as is sometimes done in practice, was shown to give seriously 
biased estimates of sampling error 

In the present paper we shall be concerned with sample estimates from strip 
samples taken within blocks of irregular shape, and consequently with sampling 
units which vary in length within samples The methods will be equally 
applicable to line plot samples. 

Following the general ideas expressed by Key man [2] it is felt that. (1) If the 
formulae of the theory of probability have to be applied at all to the treatment 
of samples, the theoretical model of sampling must involve some element of 
randomness (2) This element of randomness may conveniently be introduced 
by a random selection of the sample, but may also be assumed present in the 
distribution of deviations of timber stand volumes in the area sampled from a 
postulated pattern. (3) Many attempts to treat systematic arrangements 
statistically are faulty because the treatment consists in applying to systematic 
arrangements formulae that are deduced under the assumption of randomness. 
If the arrangement of sampling is a systematic one, and random errors are 


1 Maintained by the U. S Department of Agriculture at Berkeley, in cooperation with 
the University of California. 


179 



ISO 


A. \. HW.l. 


ascribed to Nature, then the treatment of the data riintild lie bu-cd on formulae 
deduced under explicit assumption of the sy-tennitic arrangement nf sampling 
and of some landoni element in the nnitoual. An example of (hi*- kind of treat¬ 
ment is provided by Nevman’s method of parabolic curves (2] devis'd for the 
treatment of systematically arranged agricultural experiment-. 1 1) Lastly, a 
mathematical treatment of any pmetieal pioMem is useful only if the predictions 
of the theory arc in satisfactory agreement with the empirical facts. Whether 
the method of sampling is landoni or systematic, the mathematical theory of 
sampling always involves certain elements that me postulated, either in respect 
to the method of sampling itself or in respect to the material .sampled, To have 
a reasonable certainty that a particular mathematical treatment is useful in 
practice it is necessary to make empirical .studies to find out whether the dev fic¬ 
tions from postulates of the theory that may occur in actual situations do or do 
not seriously affect the validity of the predictions. 

2. Notation and definitions. Before pioceeding to the main subject of this 
paper it may be useful to explain the meaning of certain statistical terms and 
symbols. Following Xeymau, a sharp distinction is made between three differ¬ 
ent conceptions that are frequently confused by the prartiea! statistician. 

Definition 1: If iq , u*, • • • , Uv are any fixed numbers, whether provided 
by some already completed experiment involving raudonine—, or just arbi¬ 
trarily selected, Karl Pearson’s term “.standard deviation” nf thr-c numbers 
and the letter S will be used to denote the expression .S' --* \/£(«. — iif-fX 
in which v. — 2u,/,V is the mean of the it's. 

Now let X denote a random variable, that is a variable the value of which is 
going to be determined by a chance experiment, Thus X may be the timber 
volume on a strip that is going to be selected at random from an area. Denote 
by E(X) the mathematical expectation of variable X capable of possessing values 
Ui, its, ■ • > , u„. Then 

E(X) - u^l -H inpi + • • • + u n p „, 

in which the p’s arc the respective piobabilitics of all possible different values 
of X. 

Definition 2: The words “standard error of AT" and the letter cr t will be used 
to denote the expression 

<rx - VE[X - E(X))K 

It will bo noticed that the standard error of a random variable AT may have 
its value equal to the standard deviation of some numbers u but that this does 
not mean that the two conceptions are identical or even similar. The E(X), 
and consequently a x , can be calculated only when the probability law of AT is 
known, and are constant for the population from which samples are drawn. 
On the other hand, S can be calculated for any sample of the population and 
changes in value from one sample of it’s to another, 



VOLUME IN TIMBER STANDS 


181 


Before proceeding to the third conception, that of an estimate of the standard 
error, which is occasionally confused with the standard deviation or the standard 
error, the unbiased estimate of a parameter must be defined [3]. 

Consider a set of n random variables Xi, X 2 , • • ■ , X„ . These may be, for 
example, the volumes of timber to be observed on n strips that are going to be 
selected from some area by one random method or another. Denote by 8 a 
parameter involved in the probability law of the X's. For example, 8 may be 
the total volume of timber in the area. 

Let F be any function of the X’s. 

Definition 3: If it happens that the mathematical expectation of F is 
identically equal to 6, then it will be said that F is an unbiased estimate of 6. 

Usually there will be an infinity of unbiased estimates of a parameter 6. 
They may be classified by the nature of the function F. Thus linear estimates 
may be considered such that 

F = \o + XiXi + • • ■ + X„X n 

in which the X’s stand for some fixed numbers. 

Definition 4: It will be said that a linear unbiased estimate of 8 is the best 
linear unbiased estimate (B. L. U. E.) if its standard error is smaller than or, 
at most, equal to that of any other linear unbiased estimate. 

It happens frequently that, while it is possible to determine the best linear 
unbiased estimate F of a parameter, it is not possible to calculate the value of 
its standard error, <r F . For this purpose it would be necessary to know the 
whole population sampled. In such cases an unbiased estimate of the square, 
<r* , is calculated. An unbiased estimate of the square of the standard error 
of F will be denoted by n \. This is the third of the conceptions mentioned 
above. 

The reason for the extensive use of the linear unbiased estimates and of their 
standard errors considered as measures of accuracy is the so-called Theorem of 
Liapounoff. Its content can roughly be explained as follows: If the variables 
X \, Xi , • ■ • , X n are independent and the number n not too small, then the 
probability that F — 8 will exceed a fixed multiple of o> is approximately equal 
to the probability as determined by the normal law. The above conclusion 
remains true whatever the probability distribution of the X's that is likely to 
be met in practice and also in certain cases where the X’s are mutually dependent, 
for example, when they are determined by sampling a finite population without 
replacement [4], 

The above conclusions do not apply to estimates that are biased in the sense 
of the above definition. . Also the standard error of such an estimate would not 
be a satisfactory measure of its accuracy. 

3. Description of data. Complete inventory data from the Blacks Mountain 
Experimental Forest, located in the Lassen National Forest, provide suitable 
material for testing the applicability of sampling theory to timber cruising. 



182 


A. A. HA SEE 


The timber is a virgin, all-aged stand, classed as pure, pine type, with more 
than 90 per cent of the volume in ponderosa pine and Jeffrey pine. Most of 
the volume is in over-mature trees, i.e., trees over 300 years in age. The stand 
is considered to be fairly representative of the medium arid the poor site qualities 
of the northeastern California plateau. 

With the exception of a few localities, all of the area wfus mapped as of uniform 
timber type according to the standards commonly used. Being fairly uniform 
also with respect to site quality, it may therefore he considered os a single. 
stratum. Variability of stand volume from place to place within a stratum may 
be generally expected to be leas, on the average than variability between places in 
different strata. Likewise, within a stratum, variability within compact sub¬ 
divisions may be expected to be less than average variability within the whole. 
Heterogeneity can therefore be controlled somewhat by subdividing the stratum 
into blocks and treating each block as a separate, population. 

More frequently than not, in practice, volume estimates are needed both for 
the total timbered area and for separate working units or compartments within 
the area. In general, working unit boundaries are defined by roads, ridge tops, 
drainage channels, and regular land subdivision lines. These working units 
can be taken conveniently as blocks, or if large enough, may be subdivided into 
two or more blocks. Such is the basis used for subdividing the area in the 
present study. 

The complete inventory data for these blocks are given in Table I. All the 
strips are 2\ chains in width and extend in an east-west direction. The length, 
X, is given in 10-chain units, and the volume, Y, is given in units of 1,000 feet 
board measure. 

4. Method of estimation based on correlation between volume and strip 
length. The usual practice in sampling timber stand volume is to take measure¬ 
ments on plots or strips that are either regularly Bp&ced or selected at random 
from all possible plots or strips within blocks. Oftener than not blocks are 
irregular in shape, and the number' of plots along lines or the lengths of strips 
will vary. This variation introduces the matter of proper “weighting” in calcu¬ 
lating sample statistics. Such is the case in 15 of the 20 Blacks Mountain blocks. 

If we let F ( represent the volume on the tth strip of length Xi, with length 
expressed say in 10-chain units, and assume that the entire block contains a 
population of N strips, then the average volume to the unit of strip is 0 » 

/ £ Xi . It is obvious that, if X< is known, and this is assumed to be 

true, the problem of estimating 0 is equivalent with that of estimating the total 
volume. The usual procedure of estimating is this: 

Out of the N strips within the block a sample of n is taken, giving n pairs of 
numbers selected out of the X’s, and F's, Let us denote them by 


*», Vi ; , y, ; • • • ;x n ,y n . 



VOLUME IN TIMBER STANDS 


183 


The ratio b = y m / x,, is then considered an estimate of /3, so that the 

i=i / 1=1 

V 

estimate of the total volume in the block is b y X, . 

i=i 

Our purpose now will be to study the above estimate b from the point of view 
of unbiasedness In this paper it is assumed that the sampling of strips is 
purely random. To find out whether b is an unbiased estimate or not, its 
expectation must be calculated. This will be done in two steps. To begin 
with assume that the values ap, x s , • , x„ are chosen in one way or another and 
fixed. The value of b will then depend on the y’s only. It is possible that to a 
given value of x, say x \, there will correspond just one value of ip in the block, 
but generally there will be several strips of the same length x, with varying 
volumes of timber The selection of any strip of this group to be included m 
the sample will keep the denominator of b constant, but will cause some variation 
in the numeiator The expectation of b calculated under the assumption that 
the x’s arc fixed is 


(1) X(b | xi, Xi, , x„) = E(y t | x t ) / ]C x ,, 

in which E(y t | r,) denotes the expectation of y x calculated under the assump¬ 
tion that a;, has a fixed value Obviously E(y t | x,) will be what is called the 
regression function of y on x, or of volume on the length of strip. 

It is safe to say that the graph of E(y | x ) would almost always be rather 
iriegular. On the other hand, it is known that the substitution of smooth curves 
representing the iegressions for the true irregular polygons frequently gives 
results that are surprisingly accurate Therefoie it would not be unreasonable 
to use the assumption that E(y | x) can be lepresented by a polynomial of some 
moderate degree, 

E(y | x') = Ao + A\x -}- AiX 2 -T ■ • • -f- A,x . 

Substitution of this expression in (1) gives 


n n 



y x 2 

v nAo . , £i , 

' ’ 1 „ - + Ai + Ai - n + ' 


E(b | xi , x 2 , • ■ 

■ • + A, r 


leal 1c=>l 

£ a, 

teal 


But this is the conditional expectation of h, calculated under the assumption of 
fixed x’s, is only an intermediate stage in the calculations. We need an absolute 
expectation, calculated under the assumption that the x’s are selected at random. 
This gives 


( 2 ) 


E(b ) = A 0 E 





-t - Ai T AiE 


+ •• + A,E 



Complete inventory data for 15 blocks of the Blacks Mountain Experimental Forest 

Block number 


184 


A. A. HASXII- 



ro cr. ct? c* i--* b- h- 1 -< 0710 ?c og <-h ei ££ 




j lO ■»# ifl Cl W N ® C 'O 1C r-l CO 

I h- to <r> <3 tp oi o eg 36 cr. <p o vg 
h I ^MMOrtc:COQI^NOrt1’ 
C'lC'lrtCOcOCOCC^fCCC^C^'-* 


H r'O-t'l'*C000C0CC00tO*^lN* 


HNHOOO^OJ^rtOWO^iOMW 


O C C5 0 C5 O O O) C.OC5C5 COOO 00 OT 


) ^iOtOHOOO'l , -<NiO«0^ 
I ^OlO NODOiOtOt'M^^QW 

j (Mto-o.na-rON^CQ'rtPi; 
\ r^r-ifccgcQCO^'^^'^cgiMrbeo 


iftiotot"NNg>ooo»aoooj 


rtC')ooot'*rHfocoi*j’fM(oofOtc 

tfS838Si5S33§3538!S8 

HHHr II— (r<Hf« »—« <N rH .—« r-H HH 


t- b- b- b- b-o- b» b* t". r- r- r- eg eg 


I j Ol^l^-^iOW(NC 3 QO*fMW<riiOCC 

I ?> 1 $ ?2 c5<© £! 82 2 25 3 35 3 3SS3 


Tf-t'T^^'^'^' r t 1 ^‘OU 3 >OiOtCi<OtOci 3 



Total . ISO 8,773,2 142 6,755 4 76 1,783.8 104 2,425 3 103 4,654.7 140 ,’ 5,522.2 81 .,3,676 5 31 4,366 4 











VOLUME IN TIMBER STANDS 


185 


I 

a 

| 

m 



a 

CM 


XPSNNrtTHQWiOiOcOOO^NMHOiOOlNHMCOHNCOiO^lOOipiO 

•^Tt<OCC3c£C<l<MlC(MOO(N'^tO£MC^lM-CO^HC^iracr)Cr)(NtDb-OOM'-HeO'«tllO 
00<MCOtO'Th<Nt'-C5CO'^'l'-COCOCSi«COCOC01>*a^05t^OOtOl'-«5t^COCO'^HO 
y-t 1-i r-H 

48 j 1,877 6 

H 

(NNhhhhhhhhhhhhhhhhN WNNMNNWWDINWNW 

& 

* 

^NMOHHO^OCO^cDOOrtMlPCOOOCOOO 

-^COCOKONcOHrH^HNCONNSCDfOr-iONOJ 
P5l'P5li?CON i ^OOQC3-1lOOHNC5^COH ^t 1 00 OO 

2,053 6 1 

H 

c^c^w^coeoeococococDeoeoffOWCocococococvitN 

s 

to 


O»-HOtr5^Ci0CCC<U>lO^C5OCi0 

^cpOU5(MrHCO-^T-(NCiOa)C<5^5 

coQ>-ioOcocOrH'*ts.eqipMSirt 

65cCC0tM(M05«M(N'-HT-< 

2,703.1 i 

H 

CXDOOCOCOOOOOt^t-itOiOfNC^i-li-H 

Ol 

l> 

tn 

* 

Nr-iNNh-OO^Q^cOTtfCOCKDNtOHMWcDONCJHOOiOO 

QOOCDCDlOOOuJNriMHONMCOrHQodrHCDOONMoicgtD 

Oi>*t'»cO'^<^-'-^«5c^rqtDo5r-rHOh-QcSiocoini^Tt<CD^M5eo 

eor-<cqe^cqcN(N(NC^cqc^c^N«)cocNcocN(Nc^c4nHcqn-(^H 

CJi 

00 

s 

CO 

H 

OOCJCiNh*SNSNIMMNNl^l>tMtMt>twb-to<o^cOWH 

i—1 

178 


?•* 

r-lHO)'^rtT}<{£)MM-^00 03COc£)<D^^^'£lCOTtiOOCDinNOOU5 

Tt<oooodxc<Necc^h-'^cooococO'^tpooooQr^03cn-«^i-igjc<ir-H 
OOD-^-^r-itO-fHOOiqQCOOc000005r-(c5cOOONNlOOr , *W 
i— (HCqfOCOWNcONC'lCOfOcoWC'lrONlMNHi-iHrtHi-t 

6,212 0 

H 

tDCOb*t^l^b-t^t^t^>t^t'-r^t^t^t^h-t^b-COCD«DcOtOtOeOC s 1rH 

165 

ro 

?■» 

T-iW’!tlHlOOTl<tOOOQOOMH0300lClO®t v «HWkOO(M 

rHeOOnQifliCirj^OHCOCOHOOMOOQNlMQNlN^^h; 

WN'^^ifiO^iONtOH^gon^HffliOiOcDi'HTfNNoO 

t- l r—1 rH »«H (N T—( CS *“( r - I !M M T~H i—1 r-t r—t «—l ^ 

3,861 1 

H 

HHNcO^^iOiplOiO^OiO^ifliOiOiOiOlOlOlCiOWJOCDtD 

117 

IN 


r-lincOOCZlONlCOONCClOHmCOlONCOHOCOTflHCOCOCO 

(OOONQ-^T-iOlTtJ^cjoNTtiMHfOlpintHinopgQCO'-'Pifi 
N^rtfliQQQUJM-(l^pU5«^5c£3Ca^u5’^lT^Tfiin00S10- ; ftTtl 

1-H CO CO ^ u5 m to to tB CO CO IQ C* CV| CSI (N c3 rH 

7,908 4 

H 

^^H^GiOOlMTttiOt^tOtoCDCJiCncnjtOtDvO'CiO^r^TttCSiMTHi-l 

H rl M HH H H 

T6T | 

Strip number 

' HC,roM,ira<0 ^” 0 SSSS2’3Sti2S8S?3?S^SSS?S?3SS?3 

Total. 


s Numbered in order from north to south within blocks 




186 


A. A. HVXKl, 


The value of 0 has the foim of (2), except that instead of E(Zj ?' •'!>,) if contains 
SAT/SZj. Since in general the former does not nm-s-arily equal the latter, 
for the unbiasedness of h it is necessary and sufficient that .!,« --- -b. - ■ • -~ 
A, = 0 This condition implies that the regression line of y on r is a straight 
line and passes through the origin of coordinates, 

(3) E(U I 0 - T,x. 

Whether (3) is satisfied is a question of fact and can lie nuthmdatively an¬ 
swered only by direct studies of regressions on some extensive inventory data. 
It may be noted also that in order to presume that (31 is usually satisfied, it 
should be established for a large number of areas. On the contrary. if a study 
of only a few areas shows that (3) is not true, then it would not bo wise to take 
it for granted when attempting to make a sampling inventory of an un¬ 
familiar area. 

To investigate this point, linear regression equations of volume on the length 
of strip were calculated for 15 blocks of the Blacks Mountain Experimental 
Forest and it was found that the constant terms were both positive and negative 
with their absolute values varying from 12 to 677. The eonolusiuu drawn is 
that the usual estimate I) of the average volume per unit of strip is likely to be 
biased and that tlieie is justification in looking for an alternative method leading 
to unbiased estimates. 

5. Best linear unbiased estimate of volume, based on the linear regression of 
volume on length of strip. In this section will be suggested a method of es¬ 
timating the total volume, say 0, of a timber stand, which could be considered 
as an improvement on the one considered above. The, new method consists ot 
using a linear unbiased estimate of 0. In order to deduce the foun of this 
estimate, certain assumptions have to lie taken for granted concerning (lie 
timber stands to be sampled, and if it happens that these assumptions are 
unsatisfied m a particular ease, the new estimate will not nece.ssaiity possess the 
desiied property of unbiasedness 

In deducing .the estimate F it will be assumed that the timber stand to be 
sampled satisfies the following conditions- (1) That the regression of timber 
volume on length of strip, X, be (approximately) linear and (2) that the vari¬ 
ability of the T’s for a fixed X is precisely known. It will not be assumed, 
however, that the linear regression line passes through the origin of coordinates, 
and this will allow F to be unbiased in such eases, as exhibited above, where h is 
biased. Following the Markoff method [5], [3j it can easily be shown that then; 
is an infinity of linear estimates of 0 which are unbiased under condition (1) 
It follows that a choice can lie made among them so as to diminish the standard 
error. This, however, is possible only when something is known about the 
variability of the F’s when the value of X is fixed. For the present we shall 
assume condition (2) concerning this particular point, but in practice this will 
generally be quite impossible. This point w ill be considered further in Section 0. 



VOLUME IX TIMBER STANDS 


187 


Consider then a sure or non-iandom variable 3 X able to assume the particular 
values JYi, X* , ■ ■ , X s . Assume that thcie is a finite population ir, of TV, 
numbers Ua , w, 2 , ■ • , u lNl corresponding to each value X,, i = 1, 2, • • ,s. 
Assume that the mean it, of the population t, is a linear function of X , , i.e , 
for any i, 

u, = A + BX t , 

with some unknown values of A and B. 

Assume that out of each population ir, there is selected without replacement 
a random sample of n, individuals, with 0 < ?i, < TV,, and denote by y,j , 
y, 2 , ■ ■ • , 3 /,„, the values of the u ’s to be drawn 

If the regression of the amount of timber on the length of the stiip is linear, 
then the problem of estimating the total stand is equivalent to that of estimating 

9 = E JV.U + BX t ) = iE -V, 4- B E N t A \. 

t-1 i-l 1=1 

Since the length of the strip, X, could be measured from any arbitrarily chosen 
origin, no generality will be lost by assuming X) - 0, so that 6 = 

1=1 

B 

AElV, = A TV (say). Weighting the j/,/equally for each fixed t the B L. U. E. 
of 8 may be denoted by 

(4) 

i-l 

in which y,. =, Sy,,/n, . Here the X’s must satisfy the conditions of unbiased¬ 
ness, 

(5) E(F) m 8, 
and of optimum, 

(6) o> = minimum. 

It may easily be shown that condition (5) will be fulfilled by (4) if the \,’s are 
so selected that 

(7) EfU. = TV; EtU.X, = 0. 

,-i ,-i 


3 This is an English translation from an excellent French term "nombre certain” and 
‘‘fonction coitamo” to denote a non-random number and non-random function, invented 
by Fi <5chet. 



188 


A A. HVSU, 


Condition (G) may now I jo considered. 1'ioin tin- general formula for tin* 
variance of a linear function of .several random variable' and the fuel that y>, 
is independent of ijki, 

s 

o> = X) + n,(n, — 1 )y t K[(y,k — «, »(//,< — «. ))} 

liul 

(8) = L fn,x!.s1 - ,S ^ ( n\xt - »,X?) 

= i -Si _ ; i,) - x; = £ 4?X? (say), 

rii —* 1 

in which Si stands for the (S.D.) 2 of the population n, i.e , 

.si = — xi («., ~ «, ) 3 - 

A, i-i 

In addition to satisfying equations (7), the, X<’s must be selected so as to mini¬ 
mize (8). 

Using the method of Lagrange, we find 

(9) X, = (« + 0X,), 

for the case where 0 < n, < X, and 4, ^ 0. If n> = *Y,, then 4,- - 0 arid 
a + fiX, = 0. 

Assume first that all n, < iY,, i = 1, 2, • • * , s. Then a and ii are obtained 
from equations (7) after substituting in them (9), namely 

« XI «V + (3 XI w < -Y< - X 

(10) ' 

a XI KCV, + 0 XI Al = 0, 


where, for simplicity 

/■‘i-a . a, (AT, * l)n, \ v 

(11) w, = — 2 = -- - - ; X. w, = II . 

4, (Ah - n,)»S, —i 

If w, is considered as the weight of the observations at. X ~ A',, it will be con¬ 
venient to introduce a weighted mean and weighted S.D. of A'\s as follows: 


21 Wt X i 


XI *. Al 


.B = ’T 


With this notation equations (10) can be rewritten and easily give 


= N(Sl + a 2 ) 
WSl ' '■ 



VOLUME IN TIMBER STANDS 


189 


Substituting these values into (4), simple transformations give 
(14) F = N{y — xb a ), 

in which y = 22 Wt?/. /IT, and 6 0 represents the unbiased estimate of B and is 

l 

given by 

be = [(1/WO 22 w,X,y t - xg]/Sl. 

x 

The next step is to calculate o> ■ Substituting (9) in (8) and using (11) and 
(12) gives 

4 = W(a + /3x) 2 + WtfSl 

Using (13) gives finally 



If X is the length of a given strip in any chosen units and X the average of 
such X’s for a given block, then (14) and (15) may be written 


(16) 


F = N[y + b 0 (X - *)] 


(X - £)' 


2 


iv 2 r 

= _ 14- 

w L 


si 


]■ 


Similarly for the case where one of the n.’s equals N ,, for example, rii = N y , 
we find 


(17) 

in which 


Also 

(18) 


F = N[ Vl . + 5(X - Zx)], 


b 


22WA\ - Xi)(y, - yi.) 

t-2 

22 w,(x, - Zx) 2 

»-2 


* = N 2 (X - Zx) 2 _ 

22 Mx< - Zx) 2 


It should be emphasized that Zx in the above formulae does not necessarily 
represent the smallest of the Z’s but the one of them for which = N{ , 

The case where two or more of the n,’s are respectively equal to the corre¬ 
sponding iV,’s need not be considered in detail. Together with the assumption 
of a strict linearity of regression such an assumption, for example, that n x = JVx, 
and ?iz = N 2 , would lead to the conclusion that the regression of volume on the 
length of strip is accurately known and that the estimation of 8 could be made 



190 


A. V. HIM.I, 


without error. Owing to the fact that the hypothesis about the linearity of 
regression is, at best, only approximately correct, the errors of estimation will 
always be present and it is imperative either to arrange the sampling so as to 
have at most one of the n,'s equal to the corresponding ,V, , or to base the 
statistical treatment of the sample on a theory different from the one con¬ 
sidered here. 


6. Additional hypotheses concerning . The formulae, (lbs with the »Ts 
determined by (11) are impossible to apply in practice because we do not know 
the values of the >S'? , The best we can do is to make plausible guesses as to 
what may be the values of the S 2 . These guesses are bound to be at most 
approximately correct and therefore the estimates of 0 that one can apply in 
practice will be only "approximately best,” It is easy to see, however, that we 
may keep them unbiased. 

Suppose that we denote by r\ the presumed value of • Substituting this 
value in place of S? in (8) we should repeat all the calculations, leading us to 
such \[ that will assure the unbiasedness of, say 


Ft = nXv,., 

i-i 

and also a minimum value of, say 

5 V' n,(«Vi — a,). ij 
- 2_, r, X, . 

.-1 jV , — 1 

The values of the x' will be obtained from the same formulae us those of X, , 
except that instead of <$! they will depend on r] . Consequently F t will have 
the same form as F, 

(19) F, = JW + &J(£ - *')], 


with the difference that x', y', S' x , and l/ 0 will now have to be calculated with 
different weights, say 


v t 


_ % - 
(IV, - 


1 )n.- _ 

U()r\' 



If the form of the unbiased estimate F t is as that of F, the square of its standard 
error a] is more complicated. In order to calculate it we have to go back to (8) 
and substitute into it the new values of x( obtained from the. guessed weights r , 


with 

(20) 


x; = 


Ni - 1 

(JV. - ndr\ 


w + 


— yg 1 * 


/ 3 ' = - 


Nx' 

VS' 2 ' 



VOLUME IN TIMBER STANDS 


191 


we have 


( 21 ) 


2 v-' c 2 n,(N t — n,) 12 

= ~n:- r Xi 

= Ey,p,(a' + /3%) s , 

«-l 


where p, = S 2 /r 2 . It will now be helpful to introduce notation for another 
kind of weighted mean and weighted (S.D.) of the X's, with weights equal to 
y,p,. So let us write 

) i VtPtXi X) Uip,X t 

(22) *" - -; Si' 2 = -2=- - s" 2 . 

/ j V{ p\ / j Pi 


Expanding (21) and using (20), we have 


(23) 


2 

<T 0 


X 2 £r,p, 

x ' 

yi 



*'($' - s")T 
S( 2 - 


+ 


z' 2 s" 2 \ 

s? r 


Formula (23) refers to the case where the X’s are measured from their popular 
tion, mean, X. In order to reduce it to the case where the X’s aiegiven in 
their original values we have to substitute (x' — X) for x’ and (x" — X) for x". 
Thus 


(24) <r 2 


N 2 £ t Hpi 

X 

yi 



(* - X)(*' - *")T l 

- s 7 -J + 


(2 - X) 2 s?\ 

S?J 


Applying a similar procedure to the case where 7U = Ni = 1, but n, < N; 
for i = 2, 3, • • • , s, we easily find 


(25) 


and 


Fa = N 


Vi - (Xr - X) 


£ v,(Xi - Xi)(y, - yi.) 
£ t\(X, - x t ) 2 


( 20 ) 


£ w.p.(x, - xo 2 


ffs 2 = X 2 (X, - X) 2 -- 2 


[E i'.(x, -: 


xo 2 


2 * 


This formula will help us to test the appropriateness of guesses about the 
values of S 2 , ■ It will be noticed that the X’s contain S] or r\ in the same powers 
in the numerator and in the denominator. It follows that all we need to guess 



TABLE II 
Tallies of S] 


192 


A, A. HASEfc 



Weighted ' 101 5*1 C62‘ 830 1.370 Si-1 1.020 765 4.310 





VOLUME IN TIMBER STANDS 


193 


is a system of numbers proportional to Si and not the S 2 , themselves. Our 
problem will be to test a few such guesses on the data of the Blacks Mountain 
Experimental Forest and see which of them gives generally a smaller value of cl 

Table II gives values of the S\ , calculated for 15 blocks, together with the 
corresponding A,. In a few cases N t = 1 and consequently S, = 0. These 
cases are not included in the table. Using the values of S 2 from Table II and 
assuming systems of the n,’s, the values of c\ were calculated for these blocks. 
These would be the true (S.E ) 2 of the best linear estimates of the total timber 
volume in each block, but it would never be possible to calculate them from 
sample data. 

The cl’s were calculated using the following guesses concerning the Si : 
(1) That they do not depend on X t , (2) that they are proportional to , 
and (3) that they are proportional to \ZX t . The ratios o%/cl for all blocks 
taken together were found to be .770 for guess (1), .769 for guess (2), and .777 
for guess (3) It is seen that, on the average, the guess that the S\ are propor¬ 
tional to Xt gives the smallest average value of a] . It is interesting, however, 
to note that the differences between the three guesses are, for all practical 
purposes, negligible 

Ratios like o>/ a] are sometimes described as the “amount of information” in 
F 0 as compared with that in the best linear unbiased estimate F. This ex¬ 
pression was introduced by R, A. Fisher. In certain cases it has the following 
property which justifies the term used: Let n be the size of the sample which 
serves for calculating F a , then, if it were possible to calculate the best linear 
unbiased estimate F, the same accuracy of estimation would be obtained by 
using a smaller sample size no 2 ?/cl . In the case considered m the present paper 
the above circumstance does not occur. Still, the ratio o>/ c\ seems to be con¬ 
venient to describe the situation. 

7. Another scheme for estimating 9. It will be noticed that the ignorance 
of what are the S 2 is not the only circumstance which makes it difficult to apply 
the above formulae There is also another one connected with the values of Ni 
We have N, = 1 in several blocks and for several different strip lengths. True 
this might have been avoided by defining block boundaries in such a way that 
N, > 2, but it was considered best to conform strictly to the practical situation 
where the NS s may be smaller. In such cases we may include in our sample 
all the strips of a given length, say Xi . If we apply to such samples the above 
formulae, deduced under the explicit assumption that the regression of Y on X 
is strictly linear, we shall force the fitted regression line through the point 
(.XV, Yj.), As the assumption of strict linearity is obviously not. exact and the 
exhaustion of strips of length Xi is possible only when there are very few such 
strips, the whole procedure may lead to serious inaccuracies in the final estimate. 
One safeguard against this is never to exhaust strips of any given length when 
dealing with formulae deduced from finite populations. 

The fact that the true regression point (Xi , IV.) does not actually lie on a 



194 


A. A. IIASE1, 


straight line makes it uncertain whether taking into account the fiuitenoss of 
populations of strips of the same length is beneficial to the accuracy of the. 
finite estimate. In the preceding sections wc worked on the assumption that 
there is hut a finite number of strips of the same length and on an inaccurate 
assumption that the regression is strictly linear. In the present section the first 
assumption will be dropped, having in mind that the effect of the inaccuracy of 
the second assumption may thereby be reduced. 

The assumption that each of the N, is infinite will he made, only in deducing 
the X, and will be reflected in weights. Formula (II) will now reduce to w, ~ 
n,/S? . If we assume further that = X?/k, where y and k are Home con¬ 
stants, then 


u\ = 


kni 

y-ri 

A« 


W = £ fl>, - k 

X 



and the final estimate is 

(27) F = X\g + ~ Jf)]. 


The square of the standard error of F has again the same form as in (16), 


(28) 


3 


a. 



a - ir 

ria "" » 

AJx *- 


the only differences being in W, i, and Si. If 7 = 0 , so that the S* are. assumed 
to be constant, then 

©, = kn, ; W ~ k'E n,, 

\ 

and all the symbols .c, y, and S’ assume their customary meaning of ordinary 
means and of ordinary (S.D.) : 

It would be easy to deduce explicit formulae for 7 = 1/2, etc., but they are 
not elegant and, if the necessity arises, the calculations could lie carried through 
by starting with Hi, = l/X 7 • The omission of k does not influence the form of F. 

The question whether the combination of one true hypothesis about the N t 
being finite, with another incorrect one that the regression is strictly linear, is 
better than that of two incorrect hypotheses, will be studied by means of a 
sampling experiment in Section 9. 


8 . Unbiased estimates of o>. While it may not bo unreasonable to hope 
that a guess of a system of numbers proportional to the may be successful, 
it is entirely hopeless to try to guess the actual values of the . It follows 
that, if it is desired to obtain from the sample some sort of measure of the 
accuracy of F , we have to calculate an estimate of o> . 

We shall treat the problem by assuming that the regression of Y on X is 
strictly linear and that the S\ are proportional to X, Y and that the Ab are all 
finite It will be noticed that they will enter the formulae by means of the 



VOLUME IN TIMBER STANDS 


195 


ratios (AL — l)/(vV, — n l ). If it is desired to obtain formulae referring to the 
assumption of infinite N t ’s, it will be sufficient to replace these ratios by unity. 
Of course, the symbol N will always represent the total number of strips in the 
actual block on which it is desired to estimate the volume of timber and will 
not be affected by the assumption of the iV.’s being infinite 
On these assumptions 

E(y t ) = A + BX,, 

*1, = E(y ti -A - BXtf = S\ = kX:, 

with some value of y supposed to be accurately guessed, which however need 
not be specified, and with an unknown factor of proportionality, k. Tho square 
of the standard error of y % is then known to be 


(29) 


2 


S\N, - n, _ . X7(N, - n t ) 
n, N, - 1 n,(N t - l) - ' 


The right-hand member of (29) is equal to the reciprocal of what we have 
formerly denoted by w, and described as the weight of the observations at 
X = X, . We have mentioned above that the formula giving F does not 
depend on the values of the w ,, but on proportions between the wt. In other 
words, if wc drop the unknown factor k and denote by w, the ratio 


(30) 


nj(N, - 1) 
X: (AT, - n.) 


w,, 


which involves only known quantities, these new weights will lead to exactly 
the same value of F as the original weights. It will now be convenient to alter 
the definition of weight and use formula (30). With this new meaning of W(, 
(29) could be rewritten crj t , = k/w,. 

Let us further use the letter m to denote the number of those X.'s for which 
we have at least one observation. In other words m will be the number of 
different lengths of strips m the sample and also the number of different y ,.’s 
that we are going to calculate from it. 

Now let us go back to formula (16) giving the square of the standard error 
of F. We notice that, while x and 'Si do not depend on the unknown factor of 
proportionality, k, the sum W of the original weights does depend on it and with 
our new meaning of W{, 


It follows that o> should now be written in the form 


2 

<r? = 


t 


(31) 



196 


A. A. HASEL 


and that, in order to estimate o> it is sufficient to got an estimate of k. We 
easily get an unbiased estimate of k by merely applying the second part of the 
Markoff Theoicm [3]. According to it an unbiased estimate of k, based on 
vi — 2 degrees of freedom is given by the ratio 


(32) 


v' bn- -v~ - jb)1* ... 

Ay ---— 14, 


m which t, y, and b 0 are calculated according to the assumptions made regarding 
Ni and y. It may be expected, however, that the estimate (32) will not be a 
very accurate one because the number of degrees of freedom on which it is based 
may be very small. 

In an attempt to find a better estimate of k we shall proceed by analogy and 
calculate the expectation of a sum similar to the one in the numerator of (32) 
but depending explicitly on the particular y,/s, namely of 

$ = E E [y v - y - bo(Xi - *)]* U,, ‘. 

i-i i-i 

It will be noticed that if the Ni are finite, y i} and y,i are dependent and that the 
Theorem of Markoff does not apply to Si. Introduce the notation 


v * ?■ ~ y,i A HXi , 

n i 

X - 

V 


<■ - i-iiv., = y <• — a — bx<. 

n, ,-v 


Easy, but somewhat long calculations show that >So can be rewritten in the form 

( m \i i r * -p 

E M*),- +qi E MXi — i)rn. 

i-i / li-t J 

✓ y 71 . « -— i n — ■ ii. , * 


sl = E E “ ’hi 

,-i ,_i iy 


w< 


which is most convenient for calculating the expectation sought. We notice 
first that 


E( n \ t ) « kX 7, 

E(t)t) = alt. — - • 

W{ 

Further, as jp. and y t . are mutually independent if i & j, the same is true for 
ip. and Tjj .. It follows that 


Eiyi.rtj) = 0, 

Consequently 

/« \a / m \ m m 

E(Emi 7.-J = £(Ewy,.j = E^ew.) = JfegttV. 


t 5>* J. 



VOLUME in timber stands 


197 


Similarly and for the same reason 

E [£ tft(Z. - x) Vi .y = ksl £ w.. 


It follows that 

and 4 that the ratio 

(33) 


E(Sl) = fc["£^-- - 2 ], 

L.4-1 iV, 7li J 


sl 


£ £ [ 2 /, - y - UXt - *>J* - 

_ 1-1 1-1_ Wi 


y' n t (N> — 1) _ 2 


«.(#. - 1) 


- 2 


f=! JVi - », f=i IV, - m 

is an unbiased estimate of k. In cases where either all m = 1 or all N> are 
infinite the denominator of (33) reduces to the number of degrees of freedom 
m So, equal to — 2. In other cases the denominator of (33) is greater 
than the number of degrees of freedom. Whether the numerical difference is 
large or small depends on the fractions (IV, — 1)/(1V, — n,). We may expect 
that in many practical cases it will be small. 

We shall write 


1-1 tit ,_i / ,-1 

m j m 

£ w,(X { — x)y t . / £ Wi 

i-i / ,-i 


It follows that 


S x Sy 


sl = £ iftSjd - r ! ). 


Substituting this formula into (33) and then the result of this substitution in 
place of k in (31), we finally get 


(34) 


2 AT 2 
Mr = W 


Sl( 1 - r 2 ) 


1 ) - 2 


|\ , (* “ *)H 

L ^ S 2 J' 


(=1 IV.-n,. 

The case where one of the n { is equal to the corresponding Ni, e.g., where 
n\ = Ni = 1 is treated in a similar manner. Using formula (18) and the nota¬ 
tion adopted above, we can write 

n\x, - xy 


<s r 


- k 


£ w,(x, - x,) 5 



198 


A. A. IF ASM. 


The unbiased estimate ur of a, will differ from this expression in that instead 
of the unknown factor k it will contain its unbiased estimate. To find this 
estimate we proceed exactly as above and calculate the expectation of 

.si - £ £ \y>, - ih' - h(X> ~ Ah)]* 

.-1 i»l u, 

with 


6 fl = 


£ Wi(X, - A',)0/.. - J/iA 
£ w.(A', - Ah) 5 


(35) 


The unbiased estimate of o\- is 

2 ^ _A fi (A'i - Xf 

t ^ “- 1 - 3 - 1 £ M ~ Ah) 1 

Aq — n, ,-1 

The number of degrees of freedom, /, on which ar i# basal is 

/-£*- 1 . 


9. Empirical tests of the preceding theory. Applications of any mat hcmatic&l 
theory involve certain assumptions about the phenomena studied that are not 
exactly true. In order to have a reasonable hope that, the predictions of the 
theory will be comparable to the actual facts, we must perform empirical teats 
and see whether such deviations from the assumptions underlying the theory as 
are usually met in practice influence materially or not the working of a given 
theory. Our object in the present section will be to test whether and to what 
extent such deviations influence the, applicability of the theory. For that pur¬ 
pose it will be useful to enumerate the more important uses of the theory that 
are likely to be made. 

The first point refers to the choice of the standard error <r, of the best linear 
estimate F as the measure of accuracy with which F estimates the unknown 
volume of timber, 0. If all the assumptions were true, the Theorem of Lia- 
pounoff would guarantee that, when the size of the sample, £»,, is only mod¬ 
erately large, the frequency distribution of the ratio 

(30) (F ~ 6)/ gp , 

would be very approximately normal about zero with unit S.H. If this were 
actually true then the value of <r r would be a justifiable basis for the choice 
between various alternative estimates of 0, However, the discrepancies between 
the hypothesis underlying the theory and the actual facts may easily produce a 
bias in F, or may deprive o> of the above important property. 



VOLUME IN TIMBER STANDS 


199 


Therefore, the first thing that we have to test is whether in such conditions 
as are actually met in practice the ratio (36) is in fact distributed in repeated 
sampling in a way that is comparable with the normal law. The data of the 
100 per cent survey of the Blacks Mountain Experimental Forest will serve us 
for the test. 

The second important application of the theory is connected with the use 
of ix r . The purpose of calculating is to characterize the accuracy of the value 
of F obtained from the sample. The most appropriate way of doing so is to 
calculate the confidence interval for d. This has the form [5] 

(37) F — tapr 5: ^ F “b ta/XF 

in which t a denotes the “Student”-Fisher t taken in accordance with the number 
of degrees of freedom in ix F and the chosen value of P. The confidence interval 
has the property that, if calculated for a great number of samples, the frequency 
with which the true value of 6 will lie between the limits F ± t a fx P will approach 
the value a = 1 — P defined as the confidence coefficient. 

The above statement concerning the confidence coefficient is strictly true if, 
apart from the various hypotheses that were enumerated, the distribution of the 
y’s is normal. As a result of a theorem by Kozakiewicz [6] the same statement 
will be approximately true also for non-normally distributed y tJ 's, on condition 
that the sample sizes are considerable. In the situation where the above theory 
is to be applied all these assumptions are not satisfied. Still the formula for the 
confidence interval may well work, but before accepting this we have to have 
some experimental evidence. The crucial point that it must cover is whether 
the ratio, say 

(38) t=(F- 6)/ix, , 

does or does not follow in repeated sampling a distribution which is sufficiently 
close to the theoretical one, known as “Student’s” distribution. If the empirical 
distribution of t does approach "Student’s” law, then the frequency of correct 
statements concerning d in the form (37) will be approximately equal to the 
chosen a, and conversely. 

The n 's for this experiment were fixed according to the systems shown in 
Table III, with all X's having a chance of appearing in the samples, and the 
rii’s quite closely proportional to the iV.'s and approximately 25 per cent of the 
latter. Random sampling numbers [7] were used in making the selections of n, 
strips out of any group of strips. A total of 150 block samples were drawn, 
equally distributed among the 15 blocks. 

There are 95 samples for the case where all rii < Ni. For these, formula (19) 
was used to calculate F and formula (24) for o> , using the guess that the Si 
are constant over all strip lengths. On the hypothesis that the ratio (36) is 
normally distributed about zero with unit S.E., we divide the range of variation 
of possible values of (36) into 20 intervals such that, if the hypothesis is true, 
then the probability of an observed value falling in any particular interval is 



200 


A. A. HAfiEtj 


TABLE III 


Systema of ft,'* for sampling exprnmrnl 



equal to .05, For 95 samples then, the expected frequency in each interval is 
4.75. The observed frequencies are shown m Table IV. 




VOLUME IN TIMBER STANDS 


201 


The agreement between the observed and the hypothetical distribution is 
tested by means of the fourth order smooth test for goodness of fit [8]. The test 
is designed so as to be particularly sensitive to such deviations from the hypo¬ 
thetical distribution that could be described as “smooth ” It is used here 
because it is expected that, if the actual distribution of the ratios considered 

TABLE IV 


Frequency distribution 4 of (F — 8)/<tf and ( F — 6)/hf calculated under various assumptions 


Assumption of finite population of strips 

Assumption of infinite population of strips 

(F — 8)/tr F 

(F — 8)fnp 

(F — 0)/Vf 

All m < ft 

One nt «= Ni 

All n, < N , 

One «, =* JVi 

All B, < Ny 

One or more 

Tit = N t 

Total 

nk 

»k 

«Jfe 

nk 

nk 


nk 

5 

ii 

4 

9 

3 

4 

7 

3 

i 

5 

1 

4 

2 

6 

5 

2 


2 

7 

2 

9 

8 

1 

4 

0 

2 

1 

3 

3 

2 

4 

2 

5 

4 

9 

4 


5 

3 

4 

3 

7 

8 


6 

2 

6 

0 

6 

3 


4 

2 

5 

4 

9 

3 


5 

2 

8 

2 

10 

7 



1 

5 

3 

8 

1 

0 

3 

0 

4 


4 

4 

1 

3 


4 

4 

8 

6 

1 

5 

2 

7 

5 

12 

5 

1 

6 

3 

3 

7 

10 

5 

1 

0 

1 

7 

1 

8 

2 

1 


3 

9 

5 

11 

5 

2 



6 

1 

7 

4 

0 



4 


4 

10 

1 


2 

2 

6 

8 

4 

6 

1 

2 


D ' 

1 

Total 95 

44 

95 

44 


55 

150 

$ 1 326 

33.812 

5 463 

13.091 



1"" 

P .85 

<.01 

25 

01 




P(x 2 ) .57 

< .01 

.63 

.09 



Mm 


does differ markedly from the normal or from “Student's” one, then still the 
curve representing this actual distribution would be a “smooth” one, presumably 
with a single mode, and would cross the hypothetical curves at only a few 
points. There is empirical evidence to show [9] that in such cases the smooth 
test of fourth order is more powerful than the usual x test. 

4 By 20 intervals of equal probability. 

























202 


A. A. HASEU 


The criterion used in the smooth test of the fourth order is denoted by $ . 
If the hypothesis tested is true, then V'l is distributed, approximately, as * 2 
with 4 degrees of freedom. To calculate yf/\ we proceed as follows: 

Let a; be a random variable and H denote the hypothesis that the distribution 
of x is given by a perfectly specified function fix). The range of variation of x 
is divided into 2s = 20 intervals, 

(- *>, <*i)» (Oi, Qj), »(«i». +«°)i 


so that, if H is true then the probability of x falling within any such interval is 
exactly equal to .05. Such a subdivision can frequently be made easily from 
appropriate tables for /(x). We associate with these intervals a variate z whose 
value corresponding to the fcth interval will be 


2k- 1 _ 1 = 2(fc- g) - 1 
4a 2 4s 


k = 1 , 2 , 


2s. 


It will be seen that if we Btart at the point a t and follow up the intervals to the 
right and to the left, then the corresponding values of z will be 


z 





1 


Consideration of the variable z is then substituted for that of the observed 
values Xi, ,•••,*„ of z. If any value x m falls in the fcth interval a*„ v < 

x m < a k , then this is interpreted as an observation of x winch yielded the 
value z*. Let n k denote the number of observed x’a which fall in the interval 

(ak-i , a k ) and let the Gaussian symbol [z’] stand for the sum (z’j *» 2^ >hz ‘ k . 

*-i 

To apply the fourth order smooth test such sums have to be calculated for 
i = 1, 2, 3, 4. Then they are substituted into the equations below, deduced 
under the assumption that the number of intervals of subdivision of the, range 
of x is equal to 2s = 20. 

u\ = n -1 (3.468,440[z]) 4 , 

u\ = n _] (13.500,884[z 4 ] - 1.122,261&)\ 

u\ = n _1 (53.857,548[z 3 ] - 8.031,507[z]) 3 , 

u\ = n _1 (218.148,007[z] - 4G.239,587[z 3 ] + 1.139,500»)*. 

Finally $ = + u] + u\ . If the calculated value of exceeds the 

tabled value of % with four degrees of freedom, corresponding to the chosen 
level of significance, then the hypothesis tested, //, should be rejected. 4 

‘The above expressions for the u’a differ n little from those published in the original 
paper on the smooth test because in the latter the test was designed to apply only to un¬ 
grouped observations. The present formulae obtained in the Statistical Laboratory of the 
University of California appear in print for the first time. Obviously if the number of 
intervals 2s is increased, the formulae for grouped data will approach those for un¬ 
grouped ones. 



VOLUME IN TIMBER STANDS 


203 


The agieement between the observed distribution and the expected distribu¬ 
tion is shown to be excellent in Table IV, the probability of a greater difference 
occurring through errors of random sampling alone being .85. The correspond¬ 
ing P for the x test, where consecutive pairs of intervals are combined to make 
10 intervals in all, is .57. 

For the case where one n, = N, = 1 there are 44 samples. For these samples 
the value of F was calculated from formula (25), and the value of o> from 
formula (26), again taking the values of the Si as being constant over strip 
length within blocks. In this case the deviation from expectation shown in 
Table IV is greater than can be attributed to chance alone. These results are 
also obtained by the x test, which gives x = 25.091 and P < .01 on 9 degrees 
of freedom. 

The conclusions we draw fiom these results where one of the assumptions 
made is that the population of strips is finite, are that the block boundaries 
should be so defined that all V, > 1, or if this is not done, that the systems of 
n.’s be such that no sampling is done from strips where 2V, = 1. The fact that 
some n, = 0 when the corresponding A r , > 1 has no appreciable effect on the 
woiking of the theory In the previously described test for samples in which 
n, < 2V, the N used in formulae (19) and (24) always referred to all strips in the 
block, regardless of the fact that strips of some specified lengths X, did not 
appear in particular samples. 

Using the same samples, the distribution of ( F — 9)/nr is compared in a 
manner parallel to that described above, to the distribution of the “Student”- 
Fisher i, taking into account the number of degrees of freedom. 

The formulae used for estimating o> , namely for calculating , are (34) 
where all n, < N, , and (35) where one n, = 2Vv. The estimates of 0, namely F, 
remain unchanged from those previously calculated. 

The results from this second application of the theory as judged by the smooth 
test m Table IV lead to the same conclusions as were made from the first applica¬ 
tion of the theory, namely, that under the assumption that the population of 
strips is finite no 2V, should be exhausted in the sampling. 

It is interesting to note that the application of the x test to the observed 
distribution of (F — 0)/p r corresponding to samples with one n, = 2V< = 1, 
did not reject the hypothesis that it follows "Student’s” law. In this case the 
range of f was divided into 10 intervals of equal probability and the value of x 
obtained was 15 091. With 9 degrees of freedom this gives P of the order of .09. 

The ratio (36) cannot be determined under the assumption that the population 
of strips is infinite where one n, = N, because the values of S;/ r\ cannot be 
obtained for such strips Under this assumption it is impossible to calculate 
ffp by the formulae deduced in the present study and the first use of the theory 
must be omitted. However, the estimate of <j\ from samples can be calculated 
and the ratio (38) determined. 

The estimates F were calculated using formula (27), taking n, = w, . This 
same formula applies whether or not one or more of the 2V, are exhausted. Each 



204 


A, A. HASIX 


sample from Block 15 and one sample from Block 12 exhausted two or more 
strip lengths and their estimates could not Ik 1 , calculated under the assumptions 
made heretofore, but these can now be obtained under the present assumptions. 
The estimates \i t were obtained from (34) for all samples, taking the S] as 
constant over all strip lengths and n, « te,. The fact that one or more A r , 
are exhausted does not change the procedure for such samples in any way. 

For the case where all n, < A r , in Table IV, the value of P « .0(5 obtained by 
the 4>\ test indicates that the agreement of the observed distribution with 
expectation, although not close, is acceptable. When the data are regrouped 
into 10 classes and the x test is applied, we get P » .18 on 9 degrees of freedom. 

The test applied to the distribution of (F — 0)/W for samples where one 
or more n, = Ah indicates that the correspondence with expectation is good. 
This result is in marked contrast to the corresponding results in previous tables 
and bears out the belief previously expressed in Section 7, baaed on intuitive 
considerations, that by dropping the assumption of finiteness of number of 
strips of a given length, the error of the assumption of strict linearity of regression 
would be compensated for to some extent. On the basis of these findings we 
can add the conclusion that if, in sampling, the number of strips of a given 
length are exhausted, the assumption of finiteness should Ik*, dropped and the 
sample estimates calculated from formulae deduced under the assumption that 
all 1V{ are infinite. 

There remains some question as to statistical treatment of samples in which 
all n i < N<, that is, whether to use formulae deduced for finite or infinite popula¬ 
tions. The final choice can best be based on the relative me of the confidence 
interval (37), Where all n< * 1 the estimates are the same under both as¬ 
sumptions. For estimates of all blocks taken together the finite population 
estimates tended to be within 5.5 percent of 6 in 95 out of 100 trials, while, the 
corresponding percentage for infinite population estimates was 0.0. We there¬ 
fore conclude that it is better to use the assumption of finiteness of .V, where all 
n< < AT,. 

The method of sampling considered here is what could be called restricted 
random. The restriction consists in that we group together the sampling 
units of the same size, select nonrandomly several such groups, and only then 
proceed to draw at random n { units of a group of Ah-. Frequently the strips 
of the same size will be situated within the block close to one another. In those 
cases the restricted sampling considered will assure that the sample will contain 
elements more or less uniformly distributed over the area of the block. 

10. Summary. Several methods of sampling timber stands and statistical 
treatment of the samples were considered. Data from a complete inventory of 
the Blacks Mountain Experimental Forest served for testing the methods in 
practice. 

It was found that the usual method of estimating from strip samples taken 
within nonrectangular blocks of timber gave biased estimates, unless the linear 



VOLUME IN TIMBER STANDS 


205 


regression of volume on strip length passed through the origin of coordinates. 
It was shown that this condition was not a safe one to assume. Consequently 
methods of estimation were sought which were freed from this restriction. 

The appropriate formulae for the best linear unbiased estimates were deduced 
under various combinations of the following assumptions. 

(1) That the regression of timber volume on strip length is strictly linear, but 
may or may not pass through the origin of coordinates. 

(2) That the values of the (S.D.) 2 of timber volumes on strips of equal lengths 
are (a) constant for different strip lengths, (b) proportional to strip 
length, and (c) proportional to the square root of strip length. 

(3) That the number of strips of a given length m each block is (a) finite, 
and (b) infinite. Assumption (b) was based on intuitive considerations 
which indicated that this assumption, though known to be false, might 
compensate for another false assumption, namely, that of strict linearity 
of regression. 

It was empirically found that assumption (b) of (2) gave better results than 
either (a) or (c). However, the advantage was small and, in the author’s 
opinion, did not justify the extra labor in calculations which are simpler when 
assumption (a) is made Therefore all other calculations were made on that 
assumption. 

An extensive sampling experiment was made to test whether the smallness 
of the samples combined with the conflicts between assumptions of the theory 
and the actual facts, influenced the validity of the normal theory. - 

Whenever the sample did not exhaust strips of a given length, it was found 
that the formulae based on the assumptions that the populations of such strips 
are finite and that they are infinite both work satisfactorily, generating distribu¬ 
tions similar to those determined by the normal theory. However, the confidence 
intervals based on the true assumption that the populations of strips of equal 
length are finite, proved to be narrower. Consequently, whenever the sample 
does not exhaust all strips of any given length in the block, the true hypothesis 
concerning the number of such strips should be used. Formulae (19) and (34) 
are therefore the appropriate ones, using weights based on finite populations. 

In cases where the sample did exhaust the strips of a given length, the treat¬ 
ment of the number of such strips as finite, combined with the inaccuracy of the 
assumption that the regression of timber volume on length of strip is linear, 
resulted in marked disagreement between the actual distributions of statistics 
and those based on normal theory. This disagreement was not found to exist 
in statistics calculated with formulae (27) and (34) used on the assumption 
of an infinity of strips of a given length. This suggests the conclusion that the 
exhaustion of strips of a given length by the sample should be avoided and, 
when this is impossible, then the formulae based on the assumption of an infinity 
of strips of a given length should be used. 

The formulae deduced can be applied equally well to line plots as to strips. 
With the formulae deduced the most efficient sampling will be obtained when 








TABULATION OF THE PROBABILITIES FOR THE RATIO OF THE 
MEAN SQUARE SUCCESSIVE DIFFERENCE 
TO THE VARIANCE 

By B. I. Hart 

Ballistic Research Laboratory, Aberdeen Proving Ground 
■with a note 

By John von Neumann 


In recent publications von Neumann has determined the distribution of 
8 2 /s 2 , the ratio of the mean square successive difference to the variance, for odd 
values of the sample size n 1 and for even values of n 2 In this paper the prob¬ 
ability function, i.e., the integral of the distribution, is evaluated for specific 
values of n. 

Let $ be a stochastic variable normally distributed with mean f and the stand- 
aid deviation tr. The following customary definitions for the sample are: 


the mean, 
the variance, 


1 71 

x = “ 53 x i> j 

Ti peal 

s 2 = - X (z* - £) 2 , 


n 


j n-l 

and the mean square successive difference, 5 2 =-- 53 fe+i ~ Letting 

n — 1 u_i 


2 n 


„ , (1 — «), von Neumann shows that the distribution of t, «(«), is 

s 2 n — 1 

symmetrical with zero mean and intercepts equal to ± cos - (loc. cit , p. 372), 

n 

and that w(e) is determined for odd values of n by 

i'"-"-' _ , (Itn -11-1)1 1 

335=0=i“ W *-i- - 




in the odd intervals 


T s ^ 2 tt 

cos - COS — , 

n n 


cos 3ir - ^ 4ir 

- d < d cos —, 

n n 


cos (n — 2 )ir 


cos (n — l)ir 


1 John von Neumann, "Distribution of the ratio of tiie mean square successive differ¬ 
ence to the vauancc,” Annals of Math. Stal., Vol. 12 (1041), pp. 367-395. 

1 John von Neumann, "A further remark on the distribution of the ratio of the mean 
square successive difference to the variance,” Annals of Math. Slat , Vol. 13 (1942), pp 86- 
88 


207 



208 


B. I. HART 


and by ——-—— to(f) = 0 in the even intervals 

g( € !ln— 1 )—X 

2jr ^ ^ 3 t 

COS — ^ ^ cos —, 

n n 

4ir . ^ 5ir (n — 3)s- (n -- 2)*- 

n n n n 

(loc. cit, 1 pp. 389-390). 

For n — 3, 

( 1 ) 


M 1 1 

«W = — 

T — e 2 


, ir . _ 2 tt 

for cos - g e S cos — . 

O O 

For n = 5, 


1 1 

W («) =- ..... 7.aa =snr- 

»■ V- d + - * 


( 2 ) 


M 1 2 

«(«) = r- 

X T 


2?r 


7i . 2ir 
cos - + cos 
5 6 


sn 




_„ * , x,7f , 7T 

cos - 4- cos e + cos - 
_ 5 5 5 

ir 2-jr x 

COS - — COS — COS - “ ( 

5 5 5 


rll 


t 2n 

COM- — COM .. 

5 5_ 

r . 2ir 

cos “ -f cos ■- 
o 5 


for cos £ ^ t g ^ and cos ^ S«Scos~. 
° 0 o 5 

2tt q 

But for cos-geicoa-, «'(<) = 0, thus 


(3) 


For n = 7, 


(4) 


for cos 


u(e) = const. 


<o"( e ) = ± 2_ 1 

T V- t ' + f<* - i €* +"*' 


2ir 


7 - * - cos y and cos y^‘^ cos y with the + 'sign, and for 


Ott 


3tt 4 

cos Y ^ ^ cos y with the - sign 


2r 


3t 


But for cos y ^ 6 £ cos y and cos y £ e fc cos ~, w"(e) 0, thus 

^ _ «'(«) = const. 

funcUan uwd forV™ ”*?*!""; n f umerioal c l V(lluation ‘he inverse sine amplitude 
iunction used for ny 4, 5, 6, is taken from unpublished tables or the Leeendrian ellimic 

th. fSV' ‘"t,“»">*»* *bord«a C»ralS 

lhe square of the modulus is the argument for this tabulation. 


5s* 



TABULATION OP PROBABILITIES 


209 


For even values of n von Neumann shows that the distribution of «, 

^ +(0)(e) = r[[(n - 2)]r(i) lu L (j) pi ”“ 3(1 ~ pr ‘ dp (loc - cit ©- 


For n = 4, 


<*u+a» 


(«) = i f 1 


© 


gu i — 1 dp 


( 6 ) 


VS < p\/ 1 — p ’ 

whsre i [- (j ~ «. l)(i - cos^)] 


«X+(0)(«) *= -i- f = p - dp == - 

v2 *■ •'Vs« V ( P - V2 e )(p + V2 t)(l - P ) 


V2 


*V 1 + V2 


. sn 


-i/ t 1 ~ V2 1 3 \ 
\ ’ 1 + V2 e ) 


r 7T ^ , 37T 

for cos - Si « S . 
4 4 


r i «L» (-) 

For n = 6, aj^ + «,)(<) = f / — . -- dp, where 

J2f/v/* A/ 1 rv 


(7) 


ua 


0- 


ir(V3 + 1) 


*«/’/« •\/1 — p 


7r 2T 2^r 5 -tt 

for cos - S « § cos - and cos cos and where 

O o o D 


( 8 ) 


«(«) = const. 


e vr _ Air 

for cos - S e ^ cos - 7 - . 

O o 


The integrals needed to obtain u(*) for n = 6 and w'(e) for n = 7 have been 
evaluated by numerical quadrature. Graphs of the distribution of S 2 /s\ w(6 2 / s 2 ), 
for n = 3,4, 5, 6, 7, are shown in Fig. 1. 

J rk 

1 utf/s 1 ), d(S 1 /s 1 ) has been ob- 

D 

tained from /s 8 ) by numerical quadrature for n = 4, 5, 6, 7. The results 
are given in Table III. 

As is mentioned by von Neumann, It. H. Kent has suggested a series ap¬ 
proximation of the form 

» / \Jn-l+)> 

«(«) = X a* (cos 8 - — « 8 ) , 

a-o \ n / ’ 





210 


B. I. HABT 


7T *) 






f 10. 1 


since the order of vanishing of w(e) is %n — 2, and since w(e) is an oven function 
of e (loc. cit 1 p, 391). Determining the a h by the condition of narm&li&atUm 
and by the first three even moments of the actual distribution, , Ah and Af, 
(given on pp, 377-378, loo. cit. 1 ), and integrating the result, wc* obtain 

r* » / _ \Jn-s-u 

P(t < k') = / £ a h (cos* - - «*) dt 

* —oo* tw \ n / 

n ' ' 


_ (n-l)(n+l)(n+3) 
2 * 


IMn - 2], Mw - 2]) 


_ I _|_ Af-ijn + 5) _ M<(n -f 5 )(n + 7 ) M«(n + 5)(n -f 7)(n + 9)~ 

cos 5 ? 3 cos 1 - 45 cos 8 - 

'* 71 71 













TABULATION OF PROBABILITIES 


211 


(9) 


+ (n+Mt+ Mn+j) U i )Li {n ) 

A 


1 - 


M-A2>n + 13) 

2 7r 

cos - 
n 


+ 


M 4 (3n + ll)(n + 7) il/«(n + 3)(» + 7)(» + 9)' 


o 4 

3 cos - 


+ ( - +JM1+-® (n _±_ 7 J> /r (I[„ + 2], I[« + 2]) 


ip 0 7T 

15 cos - 
n 

_ M 2 _(3)i + 11) 

, v 

cos 2 - 


MiiSn + 19)(w + 3) MAt i + 3)(n 4- 5)(n j- 9 ) 
3 cos 4 - 15 cos 6 ~ 


+ («±Wn+J^J) UUn + 4]j |(/1 + 4]) 


"1 Mi(n + 3) 


cos 2 


+ 


MAn + 3) (» + 5) A/ f ,(n + 3)(n + 5)(» + 7)' 


3 cos 4 - 
n 


45 cos 6 - 
n 


The Tables of the Incomplete Bcia-FunchoiA can be used to evaluate (9), with 

x = - 1-+ 1 Y Table I shows the results obtained for the eighth and 

2 \cos O/n) / 

tenth moments for the distribution (9) and for the true distribution for certain 
values of n. 

Table II gives a tabulation of p(^ < kj for n = 7 by the use of (9) and by 

the method of (4) and (5). The approximation (9) has been used for the com¬ 
putation of the probabilities of Table III for n 2: 8 

It has been shown (loc. cit. 1 pp. 378-379) that for n —> °o the distribution of 
t becomes asymptotically normal. For n = 60 values of 5 2 /s 2 are given below 
for different levels of significance These values have been computed from 
Table III and from a table of the integral of the normal function with standard 

deviation equal to - —r if - n - ~— , the square root of the second 

n — 1 y (n — 1 )(n + 1) 

moment of the distribution of 5 2 /s 2 . 


i Karl Pearson (Editor), Tables of the Incomplete Bcla-Funchon, London. Biometrika 
Office, 1934 

6 The lesults obtained by L. C. Young using the Pearson Type II distribution are suffi¬ 
ciently piecisc for the significance levels and sample sizes tabulated Cf. L C, Young, 
"On randomness m ordered sequences,” Annals of Math, Stat , Vol 12 (1941), pp. 293-300. 




212 


II. I HART 


TABU-: I 



i/« 

J /, 

J/i? 



m 

Tret 

o» 

True 

7 

.00-412 

4X1413 

.00201 

00202 

8 

00318 

003 IS 

001.9) 

.00151 

0 

.00240 

i 

.00240 

.1 

om u 

.(X1U2 


TABU-; II 
P ^jj’< k'j for n ■= 7 


k 

By (9) ; 

By (1) ami (<•> 

.26 

.00001 i 

.(KXXU 

.30 

00007 1 

00007 

.35 

.00027 i 

(XX127 

40 

00065 

mm 

.45 

.00124 

(X)12li 

.50 

.00209 

00214 

,55 

00326 

.00333 

60 

00478 i 

0O4M1 

65 

00071 

00673 

70 

00011 j 

00913 

.76 

01203 

j 01197 

.80 

01552 

01531 

85 

0196-1 

01932 

00 

02443 

.02403 

.95 

.02995 

.02957 

1 00 

.03624 

.03593 

1 05 

.04333 

.04325 

1.10 

.06126 

.05137 

1.16 

.06000 

.06030 

1 20 

.06976 

.07020 

Values of 6 2 /s 2 for Different Levels of Significance 


n = 60 



P ~ .001 P - .005 

P " .01 p " .05 

Tabic III. 

. 1.2558 1.3779 

1.4384 1.0082 

Normal. 

. 1.2358 1.3688 

1.4333 1.0092 


This work was undertaken at the suggestion of Mr. R. H. Kent. I am 
much indebted to him and to Professor John von Neumann for many important 
suggestions and criticisms. 


Note to Fig. 1, by John von Neumann. Inspection of the graphs of o>(sV« S ) 
for n = 3, 4, 5, G, 7 (see Fig. 1) discloses certain singularities of the function 
w(6 2 /s 2 ), which seem to deserve attention, 




TABLE III 


KS-HMSXS) 


71 

4 

5 

0 

7 

s 

0 

10 

n 

12 

25 




00001 

00001 

00001 

00001 



30 




00007 

00007 

.00005 

00004 

00002 

00001 

35 



00006 

00027 

00021 

.00014 

00009 

.00005 

00003 

40 



00047 

00065 

.00047 

00031 

00019 

00012 

00007 

45 



00126 

00120 

.OOOS8 

00059 

0003S 

,00025 

00016 

.50 


00038 

00246 

00214 

00150 

00103 

00009 

00046 

00031 

. 55 


00223 

.00409 

00333 

.00237 

00168 

,00116 

.00080 

00055 

60 


00493 

00615 

00186 

00355 

00259 

.00185 

00132 

00094 

.65 


00830 

00865 

00078 

00511 

00382 

00282 

00208 

(XU 52 

70 


01225 

01161 

00013 

00710 

00544 

.00414 

00313 

00235 

75 


01673 

01505 

01197 

00958 

00753 

.00587 

00455 

00351 

80 

00356 

02171 

01900 

01534 

01263 

01015 

00809 

00642 

00508 

85 

01302 

02717 

02348 

01932 

01631 

01338 

010S9 

00883 

00714 

90 

.02257 

.03310 

02851 

02403 

.02068 

01729 

0143G 

0118S 

.00980 

95 

03223 

03949 

03412 

.02957 

02579 

02190 

.01858 

01565 

01316 

1.00 

04199 

.04634 

.04035 

03598 

,03171 

.02745 

02363 

.02025 

01733 

1 05 

.05186 

05364 

0472S 

04325 

03819 

.03384 

02959 

02578 

.02241 

1 10 

06184 

.06140 

05500 

05137 

04618 

.04120 

03655 

03232 

02S52 

1 15 

07194 

.06963 

.00361 

06036 

.05482 

04957 

04458 

03997 

03577 

1.20 



07323 

07020 

0G445 

05901 

05375 

048S2 

01425 

1 25 






.06056 

06412 

05S94 

05407 

1 30 








07040 

06531 


*/ 

/ 

/ 

/- 

15 

20 

25 

30 

40 

50 

00 

35 

40 

45 

.50 

.55 

00 

65 

70 

75 

80 

.85 

00001 

00002 

00004 

.00009 

.00018 

.00033 

00059 

00100 

00161 

00250 

.00375 

.00001 

00002 

00005 

.00012 

.00024 

.00044 

00070 

.00127 

.00001 

00002 

.00005 

00011 

.00023 

00044 

.00001 

00003 

.00007 

00015 

OOOOl 

00002 



90 

.00547 

00206 

00079 

00030 

00004 

.00001 


95 

00778 

.00323 

00135 

00057 

,00010 

00002 


1 00 

.01070 

.00489 

00222 

00102 

.00022 

00005 

OOOOl 

1 05 

01465 

00720 

00355 

.00176 

00044- 

00012 

.00003 

1 10 

.01950 

01033 

00550 

00294 

00085 

00026 

.00008 

1 15 

02550 

0144S 

OOS26 

00474 

0015S 

00054 

.'00010 

1 20 

03280 

01986 

0120S 

.00738 

,00280 

00108 

00043 

1 25 

04155 

02670 

01723 

01117 

00476 

00206 

00002 

1 30 

05189 

03524 

02102 

.01644 

00780 

.00376 

00185 

1.35 

06396 

04571 

03276 

02357 

01235 

.OOG5G 

00355 

1.40 

07787 

05834 

04379 

03298 

01S92 

.01098 

.00040 

1 45 

1 50 

1.55 

1.00 

1.05 

1 70 


.07333 

.05743 

07398 

04511 

06038 

07920 

02810 

.04055 

05690 

07797 

01769 

02750 

.01131 

.06006 

•0S4C5 

.01133 

01893 

.03034 

.04075 

.00912 

.00940 



Values of A Tor which P | 

< s 2 A 
i5 < k ) 

1 = o 

n 

k 

n 

k 

4 

7811 

15 

0468 

5 

4773 

20 

,0259 

0 

.3215 

25 

0164 

7 

2311 

30 

0113 

8 

1740 

40 

.0003 

9 

.1357 

50 

0040 

10 

.loss 

GO 

0028 

11 

0891 



12 

.0743 




213 




21G 


HIVING V. HUItlt 


Furthermore it may be shown that 

(2) Fix) = f PCS) dS. F'U) = fix), 

J—oo 

where fix) is the ordinary probability function. Also 

(3) P(a < x < b) ~ [ fix) dx. 

Similarly fox the discrete case, 

(4) Fix) - E* fill, A Fix) = j-fCO, 

(5) P(a < x < b) = F(b + h) — F(a ) - £»/(t), 

t**a 

where a, t> are among the values nh + d, and A is the usual /i-differencc. In both 

h 

cases the percentiles are given by the solutions of the equations 

(6) Fix) - n/100. 

Equations (1), (3) and (5) formulate the advantage to the direct use of Fix), 
which was mentioned in section 1. Related to this is the fact that the process 
of finding f(x) from Fix) is at least theoretically much simpler than conversely, 
as (2) and (4) show. The directness of equation (6) is often an advantage also. 

The main problems confronting one in trying to utilize these advantages are 
(a) to find suitable cumulative functions and (b) to find methods of fitting Fix) 
directly. These are next discussed. 

3. Some special functions Fix). An obvious method of attack is to use (2) 
or (4) on some fix). The integration involved is precisely the difficulty the 
writer wishes to avoid. The cumulative function might be sought directly in 
probability theory. A differential equation incorporating some of the properties 
of Fix) given in section 2 is 

(7) g- = vO- ~ v)gix, y), y - Fix), 

where g[x, y ) is to be positive for 0 < y < 1 and t in the range over which the 
solution is to be used. It is to be noted that (7) is very similar to the differential 
equation 

dv 

fa = Vim - x)gix, y), y = fix), 
which generates the Pearson system if gix, y) = (a + bx + cs 5 ) -1 



CUMULATIVE FREQUENCY FUNCTIONS 


217 


Equation (7) implies the non-decreasmg property for F(x), while for many 
choices of g(x, y ), dy/dx will be zero at y = 0 and y = 1. When g(x, y ) = g(x), 

(7) becomes 

(8) F(x) = + l]” 1 . 

Some functions g{x) whose integrals are such that F(x) increases from 0 to 1 
on the interval — a> < x < co are c, cx~\ [(c — i)a:] _1 , c se.c 2 x and c cosh x, 
where c > 0 Generalizations of their corresponding F(x) are given below in 
(10)-(14) respectively. 

Another method of attack is to simply consider functions which have the 
properties given in section 2. The assumption of high contact provides for the 
existence of certain integrals to be discussed in section 5. Many functions 
having the required properties are to be found in tables of definite integrals, 
particularly Bierens de Haan [1]. 

A list of particular F(x) is given below. In all cases the number of parameters 
would be increased by two by letting x = yx' + 5, where y and S fix the origin 
and scale. These parameters are determined by x and <r The range of x 
over which the given expression is to be used is written to the right when it is 
not (— oo, °o). Constants k, r and c are positive real numbers. 


(9) 

F(x) = x, (0, 1), 

GO) 

F(x) = (e~ z + 1 r, 

(ID 

F(x) = (sT* + l)" r , (0, co), 


r J \ 1 fa “]-r 

(12) 

f & = [(V 5 ) + 1 ] ’ (0, c) 

(13) 

to-(*»“*— +ir, (-g,g), 

(14) 

Fix) = {ke~° " inh * + l)-', 

(15) 

F(x) = 2 -r (l + tanh x) r . 

(16) 

F(x) = (- arc tan e ) , 



(17) 

F{x) 1 fc[(l + e*y - 1] + 2’ 

(18) 

TO = (1 - e~**) r , (0, ~), 

(19) 

Fix) = (x - 1- sin 2 t xj , (0, 1), 

(20) 

F(x) - 1 - (1 + x c )~\ (0, oo), 


Most of these functions have unimodal probability functions /( x), and all of 
the functions may be readily handled from the calculational standpoint. To 



218 


IKYING W. BfUIt 


check upon theii suitability for practical work, the values of a, and a 4 for -mne 
•special cases weie obtained approximately by evaluating Fir) at a eonveuient 
regular interval, diffeiennng, and using flic results as fi eipieneie.s of a discrete 

TAUI.K I 

Calculated aj anil m Jar special functions Fir) 


Function ) 

_____ i 

Parameters 



"i 

(15) i 

7 =■ 1 


ii 

■1.01 

(16) 

r » 1 


a 

3.21 

(17) 

k = 1, r **> 

2 i 

- .62 

•1.50 

(17) 

l « 2, r » 

l i 

t) 

‘1.11 

(17) 

k » 2, r = 

2 

i 

- 51 

4.22 

(is) 2 

r = 1 

1 

.03 

3.25 

(lO) 2 

r = 1 

i 

i 

f) 

2 41 


variable No correction for grouping was made The values of o.i and «< 
for sevcial of the above functions are given in Table I, where 

f\ = [ x J 'f(.v) dx, i’f(i) 

J—00 1»» 

(21) p, = r {x- p[y/(x) dx, S*(| - p\)’f(H 

J—oa l»«. sfl 

Ml 2 

otj = , a - lit. 

a‘ 

It will be seen that a variety of values of co appear. The values of « s vary 
considerably in most cases as r varies. These functions show promise of being 
useful after further investigation. The values of « a arid at for (20) are con¬ 
venient and adaptable. This function will be discussed in detail in section (>. 

4. Methods of fitting F(x). The problem of graduation of data by a cumula¬ 
tive function involves three steps, (a) the selection of the type of function 
(b) the determination of the parameters of the function, and (e) the graduation. 
The first two are often determined by such moment chamotenslies as atul 
au, as in the Pearson system of frequency functions, The third step involves 
integration or summation if f(x) is used, whereas, once F(x) is fitted, all that 
remains to be done is evaluation of the function and differencing. 

To fit F{x) by moments, it must be possible to determine the parameters of 
F(x) from .c, cr, a a and a* , The cumulative, moments described in the next 
section, when they can be evaluated, will lead to the values of the x, <r, a 3 and «< 
for various values of the parameters. If the relations between the parameters 
and the moments are difficult or impossible to obtain, then tables may be con¬ 
structed and interpolation used The usual process would be to use the a 3 

1 The method of momenta of section 5 was used for these values. 





CUMULATIVE FREQUENCY FUNCTIONS 


219 


and a 4 tables to determine the primary parameters such as c, k and ? in (9)—(20) - 
Then for the given values of c, lc, r, one computes the corresponding values of 
.£ and <r fiom their tables, and these are used to obtain the parameters 7 and 5 
for x = yx' + 5, This procedure is illustrated in section 6 . 

Even when the cumulative moments cannot be evaluated, this method is 
still possible. Graduation by a small interval is used to construct tables of 
,f, a, a 3 and a 4 for varying values of the parameters. Then the table can be 
used as described above Thus it is seen that in practice any Fix) can be fitted 
by this technique 

The usefulness of a cumulative or a probability function depends upon how 
wide a range of sets of values of the a, the function covers, and whether such 
values occur in piactice In most of the functions (9)-(20), a 3 and a 4 are con¬ 
tinuous functions of the paiamctcis. If there is onlv one parameter then only 
a 3 (or on) can be fitted 111 the range of values of a 3 which the function possesses, 
but in the case of two parameters both a 3 and a. t can be fitted. Three or more 
parameteis permit as etc. to be fitted 


5. Cumulative moment theory for F(x). A moment definition for Fix) is now 

presented Since for n > 0 , lim / x"F(x)d r = x, x n F(.c) dx cannot be 

b-*oo "o J—OQ 

used However, it was assumed in section 2 that for some k > j + 1, 
[1 — F(x)]x k is ultimately bounded Hence, lim [1 — F{x)]x’ = 0. Thus 

I—*00 

1 — Fix) can be used as a factor when mtegiatmg over any interval (a, «>), 
a being finite. But the factor Fix) must be used for an interval of the type 
(—», b) Two integrals are needed, and we define the cumulative vioment, 
by 


( 22 ) 


2T,(a) 


= f (x — a) J [I — F(x)]dx — f (x — a)‘Fix) dx, 

J a J— 6C 


which exists under the assumptions of section 2 The difference of the integrals 
is used because, as will be shown, this leads to simpler results than could be 
obtained by addition If a = 0 in (22) then calling M ,(0) = M ,, 

(23) 21, = [ x 1 [I - F(x)] dx - f x 1 Fix) dx. 

Jo J-cc 


Definitions for the discrete case are similar 


(24) 

(25) 


M,(a) = h ii - a) ( ^[l - *■(*)] - h T, k ii - aY’^Fii), 

+h 00 

21, = t 0, *[l - F(z)] - h ± h t u "F®, 


where i il)h — Hi — li) • ■ ii — j — 1 h). This function is used because it has 
simpler properties in the finite calculus than has i\ 



220 


IRVING V. BUItR 


Various relations between the cumulative moments M,(a) and M, , and be¬ 
tween these and n', , and ct, of (21) are now developed. To express Jfj(a) 

J 

in terms of Mi’s, use (x — a)’ = £ jC,x’ '(—a)'. Thus, 

i-O 

M,(a ) ~ f (x - a) 1 [1 - F(x)]dx - f (x - aY Fix) dx 

= J (x — a)’[l - F(x)]dx - J (x — aYF(x) dx — jf (x — a)'’ dx 


(26) 


Af,(a) = £ .C.C-aJ'A/,-, + 


(-a) ,4 ‘ 


>-o J 4- 1 

One reason for the minus sign of (22) may be noted here, because in the contrary 
case the last term would be f (x — a)‘[2F(x) — 1] dx. By translating the 

origin in (26) to x ~ a, renaming the moments, and replacing —a by a, one 
obtains 

(27) M ( = £ ,C, a? Mi-fa) + ft ' K . 

i-O J ‘T i 

To bring in ordinary momenta, integration-by-p&rts and (2) are used. 

(* - a)'*' 


3 + 1 


fix) dx 


M,[a) - U - F(*)}]" + [ 

(28) - Fix) T + [ {x -", a) '--fix) dx 

L 3 + 1 J-** 3+1 

= 4i f (x ~ ay+lf W dx, 

the first and third quantities vanishing because of the contact assumption. 
A second justification of the minus sign of (22) appears here, since if a positive 
sign were used, the fourth term would have been subtracted and the integrals 
would not combine into (28). Expansion of (x — a) /+l in powers of x and 
x — hi yields respectively 


(29) 

(30) 

Also getting a ~ 0, 

(31) 

(32) 


Mi(a) = ^ /+1 C,(-a)Vi+w, 

3 ■+■ 1 v—o 

1 m , 

Mfa) - r-j-rr £ ft-AOn - • 

3 + 1 (-o 


M, 


j + 1 


M/+i 


Mi = T 


f. 


3 +ItS 


H-lCiHl Ui+i-l . 



CUMULATIVE FREQUENCY FUNCTIONS 


221 


It may be shown that the existence of M,{a) implies that of the ii[ 
% ~ 1, • • ■ , j + 1, and conversely if n', exists then so do the M t {a) 
i = 0, • ■ • , j — 1 The following formulas are obtained by the opposite inte¬ 
gration by parts, taking two different forms for J f(x) dx : F(x) and — [1 — F(x)], 
to avoid indeterminate situations. 


n', = f x’f(x) dx + f x’f(x) dx 

J a J—a a 


= ~[x’ll - F(a)}]r 

+ j f x’-'H - F(x)} dx + [x 1 F(x)]l„ - j l^ x’- l F(x) dx. 

The first and third terms vanish by the contact assumption. Then using 
(x — a + a) 3-1 for x 3 " 1 , 

i-i 

(33) M ; = J £ + a 3 , j > 0 

1-0 

Also in the same manner 

m = ii) J-1 C,(a - n[yAI,-i-. t (a) + (a - /n) 3 , j > 1, 

i-0 


(34) W = iE i-iPti ~ Ma(d)Y M,_i_,(a) + [~M 0 (a)}\ j > 1, 

i-O 

using (29) M 0 (a) — n[ — a. Letting a = 0, 

(35) mJ = j > 0 

(3G) Mj = j £ ,-iC x (—Mo) 1 Mj-i-, + i-MoY, j > 1. 

i-O 

An interesting graphical property of F(x) may be seen from (35) j = 1 by 

taking = 0. Then M a — 0 and hence f [1 — F(x)] dx — \ F(x) dx. 

Jo J—00 

Thus the mean is that ordinate which equates the two areas bounded by (i) 
y = F(x), y = 0 and x = y[ and (ii) y - F(x), y = 1 and x = . f 

It is worth noting that the expressions (34) and (36) have the same coefficients, 
independent of o. This is to be expected because of the invariance of g y under 
translation. 

If a = y'l then (30) simplifies to M,(y[) = -Mi+i ■ Lastly, expressions 

J + l 

for s in terms of the Af,(a)’s are given. 

_ 3 M 2 (a) - 6Mi(a)il/ 0 (o) + 23/2(a) 

“ 3 [2Mi(a) - Ml (a)]** ' 



222 


1RVIXG W. Ill'll11 


/- f|) 


a, = L1/a f a L~ 12 *l/*f«)J/ofo) + 12.1/|<a).l/§i/ij - 8.1/' 

(6t) [2.Vifo/~ J* 

_ J § , ,(fl) f. I-.I/Jaij’ 

[2jl/ 1 (f,) _ J/=(«)].^ 

-Ilie discrete case has been carried through in an exactly similar minima- u 
ie use of finite rather than infinitesimal calculus. Only the results will f' 
stated here The notation used is that of StcfTensen |2j. ' 



AT,(a) 


“ + (>*- l)A)' ,, *(-l) , .l/ l , r 




(38) 


r+» 0 

i 







. f-lV" 







+ - + 1 (« + o-- 

■ DA]'" 1 ' 

*. i > 

(30) 

il/o(a) 

= j)/q 4* ft 





(40) 

M, 

- m + <« + 






l~Q 

j + i 




(41) 

M,(a) 

1 IU 

= — . V 
J + 1 & 

^ £ iCr k\ 

~ 4 f/ t - 


j > 0 

(42) 

Mo (a) 

— Mi — a 





(43) 

M,(a) 

i i+i 

= y 

O l O'" ; ’*’ 

"'£* C ' 

v + h ~ 

't j > 1 

(44) 

M, = 

= _L_ y 
i + l 

Ar««r ft! ' 

D* +y+i , 

i > o 

(45) 

Mo — Hi 





(46) 

M, - 

i j+l 

“ J+l r5 ‘ 

“'£* c ' n < 

+ A)*~', 

i > o 

(47) 

i 

Mi = 


r)(a - 



Mi = 

= [-il/o(a)) J 





(48) 



t y ^ 






t*-( 


~r)(- 

■il/o(a) - 

/jjU-f-II* 

(49) 

/ 

Mi = 

: £ Mrh^ 1 £ ££ kl r n t-r-l 

r ~° *-7+i *! rP ; 




(50) 

Mi = 

'(-W + Sif r t h^ A -°J k C(k 
r~t 1.7+1 jfc! 

“ r)[— 

Mo - A]' 

:*-r~D* 



CUMULATIVE FREQUENCY FUNCTIONS 


223 


The writer has verified that under certain fairly general conditions the dis¬ 
crete case (38)—(50) approaches the continuous case (26)-(36) as h —>■ 0 
The following three propositions are merely stated without proof since they 
follow so immediately from (23), (25), (31), (45), (21), (2) and (4). 
X^roposition 1: Given a art of functions Ffx) and positive constants 

k,i = 1, • • • ,nfor which X fc, = ], then for F(x) = X kj'\(x), M, = X h 

t°=l laal 1—1 

if all the latter exist. 

r 

Proposition 2. In the above notation, if all the ,n'i arc equal, then p, -X 1 1 iMj , 

i=l 

when the latter exist 

Proposition 3. If m addition to the above hypotheses, all the ,p.* are equal, 
then 


(51) a , = X kh tec ,. 

l-l 

These propositions are sometimes convenient in forming a linear combination 
of functions F(x), to obtain a function with desired properties. It may be noted 
that Proposition 1 is still algcbiaically turn even with negative fc/s, but these 
might give negative derivatives f(x) for F(x) 

6. An algebraic function, F(x) = 1 — ^. This simple algebraic cum¬ 

ulative function w ill be discussed in detail. The a, can be calculated directly by 
the application of (23), (36) and (21). The resulting a 3 and values cover a 
broad range, within which those of many empirical and theoietical distributions 
lie. A method of finding such cumulative functions with desired and au 
will be given. Several graduations arc presented for illustration. 

This function appears in Bierens de Haan [1] and has the desired propeities. 
The writer has not yet found a probability justification for the function, How¬ 
ever, since the a, are so close to those of functions which can be so supported, 
it seems that it, may eventually prove to be at least some definite approximation 
to a probability situation 

The complete definition is 


(52) 


F(x) = 1 


= 0 


1 

(1 -f x' ! ) k 


x > 0 
x < 0, 


where c, k > 1 are real numbers. The probability function 

hcx c ~ y 


(53) 


F'(x) = f(x) = 


(1 + x c ) 


M-l 1 


c - 1 


Lie 


is unimodal at x = 


<fk + 1 


it c > 1, and L-shapedif c = 1. 



224 


IRVING AV. DURR 


Use of (23) on (52) gives 

(54) m , = r 

J 0 

But from Bierens de Haan [1] 
- 0 - 1 dx 


x‘ dx- 

(1 + x')'- ’ 


j < ck — 1. 


(55) 

where a ,r| " 
Hence 

(56) 


x 

(X + s ') 1 


I (1 

= a(a + c) • • • (a + r -”Ic), 
(c - j - l) u " lle 


iU~ll £ 


(c - g).?r 

c*(fc — 1)! sin (gr/c) 


g < c, 




c k (k — 1)! sin -— x 

c 


j < c — 1 . 




n/c-i 


(57) 


Mi 


j + 

c 


dti 


') 


However, if j > c — 1 then (55) can still be used tluough reducing the exponent 
of x by a: 1-4 (1 + x c ) — x’~ c = x’ . (56) is only good for integral values of ft, 

A more general formula is obtainable by letting (1 + x c ) = 1/s. Then 

= - f 1 (1 - 8) U ^ U< "8 

c J a 

_ r C 4 i ) r (* 

cr'(fc) ’ 

for j = 0, 1, • • ■ up through j < ck - 1, and c, ft any real numbers >1. To 
determine the m values the easiest way is to compute the values of the M; 
by (56) or (57), and then to use (36): 

Mi = 2 M x - Ml, M3 = 3Mj - 6 M\M 0 + 2 M\, 

M4 = 4M 3 - 12M 2 M 0 + 12Aif l A/2 - 3 M \, etc. 

Having these, definitions (21) are used for the a;. 

The results for some integral values of k and c are given in Tables II and III. 
These computations were made from (56). Formula (57) shows that for a fixed 

c, M, for ft + 1 is obtained by multiplying Mj for ft by ^ c ” : ~ 1 . This re- 

Jcc 

cursion relation is very helpful in the computation, because it enables all of the 
values of the M /s for a given c to be found from those for the lowest value of k 
for which M / exists. The values which need to be copied down in the com¬ 
putation for Mi, v, a 3 , a 4 , by a calculating machine are Mo, Mj , M 3 , M*, 
Mo, M 0 , M 0 , 6Mo, 12Mo, 12M* , Ma > o', c 3 , o\ /i 3 , « 3 , ^ , Q4 . Because of 
heavy cancellation, especially in mj and m< , it seemed advisable to use eight signi- 



CUMULATIVE FREQUENCY FUNCTIONS 


225 


ficant figures throughout. Eight-place sines were obtained from Gifford [3], 
The values of the il I, for k = 11 were also checked by eight-place logarithms 
[4]. These verify the values of the M, for fc < 11 because of the recurrence cal¬ 
culation. 


TABLE II 

Mean n[, and Standard Deviation a for F{x) = 1 — 


C In each cell the upper number is u[ and the lower number is a) 


x. * 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

1 

—• 

1.57080 

1 20920 
97787 

1 11072 
.58060 

1.06896 

42265- 

1.04720 

.33552 

1.03438 

27953 

1.02617 

24019 



•2 


.78540 

.61899 


.83304 

30239 

.85517 

24794 

87266 

,21116 

.S8661 

18433 

89790 

16375 


.91498 

.13411 

3 

.500000 

86603 

.58005“ 

.39118 

.67178 

29349 

.72891 

.24029 

.76965" 

.20461 

.79994 

17852 

.82328 
.15847 

.84178 

14253 

.85680 

.12953 

86923 

11872 

4 

.33333 

.47140 

49087 

.30393 

.59714 

.24784 

1 

.66817 .71834 
21077, 18344 

i 

.75550 

16234 

78408 
.14555- 

.80671 

13187 

.82507 

12053 

.84025 

11097 

6 


.42951 

-.25696 


,6264li .68242 
.19269| .17028 

72402 
.15220 

.75607 

.13743 

.78150” 

.12517 

.80215" 

.11487 


6 

20000 

24495” 

.38656 

.22488 

51088 

20220 


.65513 

.16103 

.60980 
.14505- 

.73447 

.13168 

76196 

.12043 

78432 

.11087 

80286 

.10266 

7 

. 166G7 
19720 

.35435- 

.20274 

.48250 

.18851 

.57029 

.17064 

.63329 

15403 

,68045- 

.13962 

71698 

.12731 

.74609 

.11082 

.76980 

.10782 


8 

.14286 
.16496 

32904 

.18599 

.45952 

17783 

.54992 

.16316 

.61520 

14846 

.66425- 

13528 

.70235 

.12383 

73276 

11394 

.76768 
.10539 

.77820 

.09796 

9 

.12500 

14174 

30847 
.17275 

B 

.53274 

15704 

.59982 

14389 

65041 

.13171 

.68981 

12095- 

.72131 

.11156 



10 

.11111 

.12423 

.29134 

16197 

,42407 

.16197 

.51704 

.15190 

58649 

.14002 

,63836 
.12869 

.67886 

.11851 

.71130 

.10954 

.73783 

.10168 

■H 

11 

.10000 
.11055 

27677 

.15297 

.40993 

.15584 

50499! .57476 

.14749; .13669 

1 

62772 

.12608 

.66916 

.11640 

.70240 

10780 

.72964 

.10021 

.75231 

.09351 


It will be seen from Table II that in most cases the values of a 3 and an lie 
within useful ranges. The graph shows the general relationship between a 3 , 
k and c. The curves are the traces of planes k = 1, 2, ■ • • upon the surface 
aj = G(c, k). Other traces would contain all pairs (c, k ) giving a fixed a 3 . 














































2.811 .884 347 085 -.115 

10 17.83 4 122 3.043 2 883 2.928 

2.714 868 .329 .050 - 128 

11 10.48 4.018 3.000 2 800 2.920 


The surfaces for /j.[ , a and an are more irr 
a cumulative function with a 3 — a and a, 
determining a point of intersection of U 

(58) " 0< *’ ‘ 

a-i — II(k, r), 

Direct algebraic solution of this system a 
niques must be resorted to. 


burh 

[II 


“ l 4 . x ,)k 

i (i/id Die lower number is ai) 


6 

7 

B 

9 

10 

1.820 

14.77 

1.458 
10 36 

1.225 

8.342 

1.060 

7 215 

.937 

6.510 

. 134 
4.10G 

. 294 
3.859 

.190 

3.736 

.109 

3,673 

0*14 

3.646 

.110 

3.358 

.005- 

3.329 

- 083 
3.343 

-.152 

3.37B 

-.208 

3.418 

-.010 

3.109 

-.125 

3.205 

-.207 

3 203 

-.271 

3.327 

-.325 

3 393 

-.007 
3 098 

-.109 

3.105 

- 277 

3 243 

- 340 

3 324 

- 391 
3.401 

- 147 
3.005 

-.210 

3.150- 

- 323 
3.241 


-.435 

3.410 

- 181 
3.0-18 

- 279 
3.144 

-.355" 

3.244 

-.415 

3.330 

-.465 

3.430 


-.303 

3.143 

-.378 

3.248 

-.438 

3.349 


- 220 
3.033 

- .322 
3.143 

-.39(1 

3 252 


B 

-.242 

3.030 

-..336 

3 141 

-.410 

3.257 

-.470 

3.304 

-.510 

3.162 

-.25-1 

3.027 

- 348 

3 140 

— ,422 

3 201 

-.481 

3.371 

-.530 

3.470 


egular. The problem of determining 
i * h is equivalent to the problem of 

le curves 


= a 


<xi = b, 


rs veiv difficult, and other tech- 



































































CT'MW.ATIVE FREQUENCY FUNCTIONS 


227 


One method w to iw only integral values of k, and then for each k interpolate 
for the value of r giving the desired « 3 . For such pairs of c and k find <* 4 by 
interpolation. Then choosing the pairs having j U8 t above and just below the 
desired one, the proper linear combination (51) is taken. This gives a combina¬ 
tion function which has both *, and ««at the desired values. This combination 



will be an approximation to the single function with non-integ-ral k, having the 
given «a and ««. This method of linear combinations might be extended to fit 
at by using three integral values of k. 

The interpolations may he done graphically by use of Figure 1 and others like 
it. Or one may use Stirling’s formula [5], The interpolation for c from at 
is backwards, while that for on from c is direct. Sometimes it is more accurate 


228 


ihving w. nrttu 


+o use Newton’s formula (5, p. 3fi] when the values in one direction increase 
lapidlv. 

Use of a single function Fix) for a graduation is easily accomplished. First, 
obtain the c and k to be used so that a? is correct and ru is as close to the given 
value as possible. Then determine m! and a from Table* II by interpolation. 
Change the scale and origin of the original values of flu* variable X to those 
.c’s corresponding to F{x) - 1 — 1/(1 4 /')\ through 


(59) 


x — n\ _ .Y - d/ 

----- - i ~ , 


where M and lS are the mean and standard deviation of the given distribution. 
Now compute the values of 1/(1 + x')* for the various values of x. The differ¬ 
ences of these results arc equal to the differences of F{x), which hy (1) are. the 
probabilities for the given ranges of X. Multiplication by the total frequency 
will yield the. theoretical frequencies, if desired. 

If the graduation is to he done by a combination of two functions, the work 
is carried out for each as described above, and then the frequencies arc* combined 
by the same linear combination as that hy which the component m's must ho 
combined to give the desired at . This may readily be seen by eonsicleiing the 
separate cumulative, functions in terms of the standard variable t, whence the 
means and a-’s are 0 and 1 and (51) is applicable. Then the differences of 
G{t) = hGi(l) + ktGtlt) arc sought. But these can lie found hy taking the same 
linear combination of the separate differences of the functions (nil) and GVO, 
However, these values are merely computed from their respective sets of x values. 

For illustration, three graduations are given. The first is a highly normal 
distribution of heights from Rietz [5, p. OBff.]. For this distribution, if =* .02035, 
S — 2.5723, c *3 = — .012-1, at — 3.149. The graduation was done by taking (he 


function F(x) = 1 — 


which has tlu* nearly normal ehameteiistiea 


1 _ 

(1 -FxT 

<*a == — 019, at = 3.109. The object was to take, a simple cumulative function 
with integral k and c to show how a satisfactory job can he done on a normal 
distribution For this function F = .75550 and a = .10231. Then 


* - .00311 OX + .75118, 
■11.5, 


•10.5, etc. From these, 


into which aic substituted the X class-limits 

8585 

corresponding values of are calculated and differenced to give the 

theoretical frequencies for the 8585 cases. The results are given in Table IV, 
o Thc fit obtained by use of Fix) is good. One comparison test is that of 
x“. The eight classes —11, —10, —9, 9, 10, 11, 12, 13 were grouped together. 
The results were 


xl = 21.210, x v = 23.479, 



CUMULATIVE FREQUENCY FUNCTIONS 


229 


as compared to 

P(x > 22.31) = .10, 

P(x > 19.31) = .20, 

for 1.*) degrees of freedom (18 classes minus 3 for linear restrictions). One 
reason for tins somewhat lower x 2 for F(x) may be that its <* 3 and are closer to 


TABLE IV 


.¥ | 

Observed frequency [5] 

Graduated frequency by 

F far) 

Graduated frequency of 
normal [6] 

-11 0 


00 

16 3 

-10.0 i 

2 

.43 

67 

-0.0 

4 

3 23 

2 84 

-8.0 

14 

13.17 

10.30 

-7 0 

41 

39.81 

32 11 

-0.0 i 

83 

97.87 

86 03 

— 5.0 

100 

206.72 

198 17 

-4 0 ’ 


385.31 

392,43 

— 3 0 

060 

630 55 

668.11 

1 

000 

041 98 

977.92 

-1.0 

1223 

1216.47 

1230 63 

.0 

1320 

1353.08 

1331 41 

1 0 

1230 

1278 39 

1238 41 

2 0 

1063 

1013 80 

990.33 

8.0 

(M0 

076 12 

680 86 

•1.0 

a- mmm 

1 






6.0 

! 70 



7.0 

32 



8.0 

10 


10 84 

0 0 

5 

5 33 

3.01 

10.0 

2 

2 01 

72 

11.0 


.77 

.15 

12.0 


.30 

03 

13.0 


10 3 


Total, 

8585 

8585.00 

8584 99 


those of the observed distribution than are those, of the normal function. This 
gives a better fit in the tails of the distribution. Nevertheless, this example does 
illustrate how one of the simplest of the cumulative functions with "normal” 
characteristics can be used without specifically fitting a 3 and on . It mav also be 
mentioned that F{x) for c = 5, k = 0 has n 3 and a. t even closci to the normal 


3 Total of stump frequency. 










TABLE V 


X 

Observed 

frequency 

P(x), 4-4 
c - 3.228,/i 

Fix), h S 
c « 2,944; /» 

F(x) =. w.,y« j 
+ -6M7A 

Type III {SI 

-8 0 

3 

.00 

(X) 

(X) i 


-7.0 

9 

80 

IX) 1 

27 

2 

-6 0 

40 

39 58 | 

25.07 | 

. 52 j 

27 

-5 0 

167 

180 78 | 

173 27 j 

176 90 ] 

142 

-1,0 

372 

433 SO j 

415 79 

142.13 

410 

-3 0 

718 

768 83 

791.72 i 

781 71 ! 

790 

-2 0 

1186 

1110 06 

1134.52 

1128 80 . 

1186 

-1.0 

1402 

1383.00 

1384.99 

1384.40 : 

1441 

.0 

1498 

1492.0*1 

1477.80 

1482.20 

1502 

+1.0 

1460 

1419.70 

1399.70 

1405 S3 J 

1385 


1142 

1205.81 

1190.80 

1195.40 j 

1158 

.-Hil: 

913 

920 59 

921.47 

923.01 1 

891 


642 

654 00 

656.82 

655.06 j 

641 

• 'Hfl 

435 

430.66 

436.78 

434 90 

434 

■Mi 

235 

268 70 

274.63 1 

272.81 I 

1 j 

280 

7,0 

167 

161 10 

165.27 

163.99 j 

173 

8.0 

133 

93 99 

90 23 

1 95.55 

102 

9.0 

47 

53 88 

51 77 

54 50 | 

59 

10 0 

29 

30.02 

30.70 

30.68 1 

33 

11.0 

13 

17.37 

17 07 

17 16 | 

18 

12.0 

9 

9.86 

0 46 

j 9.58 ; 

9 

13.0 

5 

5.04 

5.20 

; 6.38 

5 

14 0 

8 

3.26 

2.03 

, 3,03 

a 

15.0 

2 

1.89 

1.60 

1.73 j 

l 

16 0 


1.12 

.93 

* .90 ; 

i ..I _ 

17.0 


.06 

.53 

i 67 


18 0 


.41 

.31 

i .34 


10.0 


.24 

.18 

.20 


20.0 


.16 

.11 

.13 


21.0 


27* 

.17* 

.20* 


Total . . 

j 10701 

10701 00 

10701.00 

) 10700 99 

10701 


TABLE VI 


Obaervccl (QJ 

Type III [0} 

Type A (OJ 

Edgeworth (0) 1 

F(x) 

3 

4 

5 

4 

4 

20 

17 

22 

17 

19 

38 

42 

47 

42 

42 

63 

50 

00 

59 

50 

51 

53 

50 

63 

52 

29 

33 

27 

32 

34 

21 

15 

13 

15 

16 

4 

5 

4 

6 

5 

0 

1 

1 

2 

1 

1 

0 

0 

1 

0 

230 

229 

229 

231 

229 

x a 

4 54 

7.55 

5 86 

4.03 


Stump frequency, 


230 







CUMULATIVE FREQUENCY FUNCTIONS 


231 


values, but it does not give quite, us good a fit because it tends to decrease too 
rapidly cm the left. 

The second example is also from Rietz [5, p. 108ff.]. For this distribution, 
,1/ — .( 18830 , .S' -- 2.!) 180, on = .583 and ct 4 = 3.G98. Two functions were used 
with k - 1 and k ~ 5 By interpolation 


A n_ 5 

r; « 2.911 

Mi 

.54200 

tr 

.22247 

«,i 

.583 

3.055 

A- I 

c = 3,228 

.01577 

.23823 

.583 

3 795 


Because of the rather rapid increases for smaller values of c, Newton’s formula 
[5, p. 30] yields better approximations than Stirling’s [5, p 38 (12)]. The gradua¬ 
tion for each function is carried out as above, and since 

.30(13-3.795 + .0937-3.055 = 3.098, 

the linear form 

.3003/] + .0937/; = /*, 

is used. 

Table V gives the component and combined frequencies, and also the fre¬ 
quencies from a Type 111. x‘ for both are very high even though the fit appears 
reasonably good on a graph. This result is due to classes 0 and 8 which tend to 
cause a high x* for any distribution function of a small number of parameters. 
The example, however, does show that F(.t) can be used to graduate a skewed 
distiibution 

It is to be further noted that the component functions weie used only to 
obtain an approximation to a single function with -1 < k < 5, for which a 3 and on 
are simultaneously correct. When tables more complete than Tables II and 
III are. available, such a single function can be found 

The third example of graduations is from Eldcrlon [6]. The measures were 
treated as a discrete variable in computing a 3 and a 4 . A single function 
c = 3.102, k - 11 was used. This function had a 3 at the observed value of 
.2930, while «i was 2.973 as compared to the observed 2.980. The results^along 
with those by classical methods are shown in Tnblc \ I. The above x were 
obtained by grouping the first and the last three class frequencies. The values 
are approximate because of rounding. However, they do show that F(x) does 
a comparable graduation. 

Besides aiding in the problem of graduation, this cumulative function should 
prove of value in the approximation of known or population distiibutions, as 
for example, (p + q)“. However much more work needs to be done More this 
can he more than a conjecture. 

7. Conclusion. This paper has stressed the advantages obtained by thedircet 
use of the cumulative function. A number of useful functions have been 
considered. A general method for fitting any cumulative function by the 
construction of a table has been suggested. A particular method depending 



hiving \\\ until 


















NOTES 

This wrtion m limit'd to hriij uscnrrh and expository articles, notes on methodology 
and other short itemt. 


AN APPROXIMATE NORMALIZATION OF THE ANALYSIS OF 
VARIANCE DISTRIBUTION 


By Edward Paul&on 1 
Columbia University 


Tin* ■"tatElio F ~ n? 4s®, where a® and .s’ arc two independent estimates of the 
same vaiiancc, h:i< played an r>.-N>ntiul part in modern statistical theory, All 
tests of significance involving the testing of a linear hypothesis, which includes 
the anuhsis of variance and covariance and multiple regression problems, can 


he reduced (o finding the piohaliilit.v integial of the F distribution, This 
distribution fund the equivalent dEtiilmticm of z = § log F) has so far been 
directly tubulated only for the 2(1, ii, 1, and 0.1 percent levels of significance [1] 
To find (In* critical value of /■’ for Miimi other probability level would require 
tlie use of Pearson's extensive triple-entry tables [2], which is not veiy con¬ 
venient to use foi this purpose, and in addition is inadequate for some ranges 
of the parameters, 

it, therefore appears that it might he of some practical value to ha\e an 
approximate method of determining the critical values of F for other probability 
levels. A solution will he given based on a modified statistic U, a function of F, 
so selected as to tend to have a nearly normal distribution with zeio mean and 
unit variance. This normalized statistic will have the additional advantage 
that further tests are possible with normalized variates, as pointed nut by 
Hotelling and Frankel [3], 

F can be written in the fotm 


p = xi/ni 
xS/n*’ 

where, xi and have the chi-square, distribution with n t and n 2 degrees of freedom 

respectively. It is known from the work of Wilson and Hilferty [4] that 

is nearly normally distributed with mean 1 — 2/9 n and variance 2/9 n. An 
obvious approach to the problem of securing an approximation to the F distribu¬ 
tion is to regard F' as the ratio of two normally distributed variates. In general 
the distribution of the ratio r = y/x where y and x arc normally and inde¬ 
pendently distributed with means m v and m z and standard deviations x v and <r x 


1 Work done under u grant-in-aal from the Cnrucgic t’orpoiation of Xcu York 

233 



234 


EDWARD PAt'LRON 


is not expressible in simple foim. However Ficller [5] has shown that a function 

R of v, namely R — av ill be nearly normally distributed with zero 

Vi-al + c\ 

mean and unit variance, provided the probability of x being negative is small. 
In the given problem it follows that we can regard 


(1) 



as nearly normally distributed (with zero mean and unit variance) provided 
n 2 > 3, for with n s = 3 the probability of the denominator of F* being negative 
is only .0003. If it is desired to use the lower tail of the F distribution, then the 
statistic 11 should only he used if ni is also > 3, Ordinarily, in most applica¬ 
tions only the upper tail of the F distribution is used, and -ns, which corresponds 
to the number of degrees of freedom in the estimate of the error variance, will 
be much greater than 3. 

The following tables show the degiee of accuracy of the approximation. The 
exact value of F corresponding to various levels of significance are compared 


p 

ni 1, 

Approximation 

nj « 10 

V~F 

Exact Value* 

.20 

1.37 

1.37 

.05 

2.21 

2.23 

.01 

3.10 

3.17 

.001 

4.03 

4.59 

.0001 

0.40 

0.22 



Til = 4, 

“= 8 i 

Hi 6, 

n 3 *■ 12 

P 

f 

p i 




Approximation 

Exact Value 1 

Approximation j 

Exact Value 

M 

.058 

.008 : 

, 123 j 

.130 

.95 

.101 

.100 | 

.248 

.250 

.80 

.407 

.400 j 

.497 ; 

.490 

.20 

1.92 

1.92 ! 

1.72 

1,72 

.05 

3.84 

3.84 

3.00 

3.00 

.01 

7.12 

7.01 

4.85 

4.82 

.001 

15.38 

14.39 

8.58 I 

8.38 



distribution of roots of polynomial 


235 


with the approximate values, which are found by solving (1) for F by considering 
it as a qundiatic equation in F } . In these tables P = f <p(F) dF, where <p(F) 

Jp 

is the probability distribution of F. The case rq = 1 is of special interest, 
since here F — i 1 , whore t has Student’s distribution, and is shown separately 


REFERENCES 

(1] H, A. I'lMiEK ttncl F \ atk.h, Statistical Tables for Biological, Agricultural , and Medical 

Research, London, 1938. 

[2] Is. mil P kails on (Editor), Tables of the Incomplete Beta Function, Biometric Laboratory, 

London, 1934 

[3] II IIoTKT.l.TNr; and L It. I^rankel, ‘'The transformation of statistics to simplify their 

diKtributton,” vinimfs of Math Slat , Vols S-9 (1937-38), pp. 87-96. 

(4] E. Il Wilson and M. M. Hilferty, "The dhstubutian of chi-square," National Acad 

Sc Prnr , Vol. 17 (1031), pp. 084-688 

[0] K (' Fa.u.uu, "The distribution of the index in a normal bivaiiatc population," 
IUnmi'triha , Vol 24 (1932), pp 428-440. 


NOTE ON THE DISTRIBUTION OF ROOTS OF A POLYNOMIAL WITH 
RANDOM COMPLEX COEFFICIENTS 


By M. A. Giiibhick 


Untied Slates Department of Agriculture 


In order to obtain the distribution of roots of a polynomial with random 
complex coefficients, it was found convenient to employ a rather well known 
theorem on complex Jacobians. Since proofs of this theorem are not very 
plentiful in the literature, a brief and simple proof of it is presented in this note 
Theorem: Let n analytic functions be defined by 


(l) U)p — 1 Up “H Wp — f p(f*l ) J ' t %n)i {.V 1| 7 7 

where z p — x v + iy p , i = -\/—i. Let j denote the Jacobian of the transfor¬ 
mation of the n complex variables defined by (1). That is 


dwi 

dw 1 

dZi 

dZ n 

dw n 

dw n 

dzi 

dz n 


Let furthermore J denote the Jacobian of the transformation of the 2 n real variables 
defined by the equations u p = u p (xi , z», ■ • ■ , x n ; yi , y?, , • • • , y n ) and v p = 
v p (xi , Xi , • • ■ , x n ; yi , yi , • • ■ , y n ), (p = * • * > n )■ That is 


( 3 ) 






236 


M. A. GIR8HICK 


dui 

9ui 

dxi 

dx n 

du„ 

dU n 

i dXi 

dx n 


V, = 3 


doi 

dVt 

~dx t 

dx„ 

du„ 

dv n 

dxi 

" dx n 


i du„ > _ _ du n !' 

'i tyt dy n 'j 

, dt>i j; 

i Bii 1 d2/„ || 

V, = ;.r. 

3y„ |; 

’j dyi hy n i; 


Then J equals the square of the modulus of j. 

Proof: Since by hypothesis w p is analytic we can set ~ j d J lp 

Hence j take9 on the form: 


3z, ay, 


J - I V v - iC/ v |. 


Again, since w p is analytic, wc have f« — f ^ Up . That is 

dx„ dt/„ dx„ dy 9 

U z ~ V v and V x = — U y . Hence J in (3) has the value 

Vy Uy 

(5 7- 

-t/„ y„ 

Now J can also be written in the form 

(6) /- K ‘ «•!. 

Wy Vy | 

This follows from the fact that if we multiply each of the last n rows of the 
expression for J in (6) by i and factor out i from the lost n columns, we get the 
expression for J given in (5). 

Now in (6) subtract the (n + p)th row fiom the pth row for each p = 
1,2, •••,«, This yields: 

(7, , . * W ‘ iU - - * . 

W, V, 

Next add in (7) the pth column to the (n + p)th column for each p * 1, 2. • ■ • , n 
This yields: 


V-iU 0 
iU V + iU' 


= \V- iU || V + iU I 







DISTRIBUTION OF ROOTS OF POLYNOMIAL 


237 


But ( 8 ) is precisely the square of the modulus of | V - iU |. This in con¬ 
junction with (4) proves the theorem. 

Consider the equation 

(9) - a^”- 1 + ... + (-l) n a„ = 0, 

where the a, are complex numbers. We may wish to consider the real and 
imaginary parts of a, as random variables having a given joint distribution 
function, and require to find the probability that one or more roots of ( 9 ) will 
lie in a specified region of the complex plane. In order to answer this question, 
it is necessary to find the joint distribution of the real and imaginary parts of 
the roots of (9). 

As an example let us assume that the real and imaginary parts of a p are nor¬ 
mally and independently distributed with zero mean and variance a. That is, 
we assume that the distribution density of these quantities is given by 



where Ci r is the conjugate of a p . Let Zi , z 2 , • • • , z„ be the roots of (9). The 
relationship between the roots and coefficients of (9) are given by 

n 

(11) ai-E*. a 2 = X • • • > = zizs • • • z„ 

j—i i <fc 

Thus the a p ’s are analytic functions of the 2 ’s, 

In order to find the joint distribution of the real and imaginary parts of the 
z's, it is necessary to find the real Jacobian J of the transformation defined 
by (11). Now the complex Jacobian j of the transformation (11) is defined as 

3<ii da,i 

dzi dz„ 

(12) j = . 

da„ da„ 

dzi dz n 

A simple calculation will show that the value of j in (12) is given by 

(13) j — X X (z p — z 0 ). 

p««l qt-jrt’l 

Hence, applying the theorem proved above, we get 

(14) J = | j | = X X I z p ~ I > 

where the symbol || stands for the modulus. 





238 


HILDA GEIBINOEH 


From (10) and (14) wc conclude that the joint distribution density of the real 
and imaginary parts of the roots of (0) is given by 


(15) 



±{± «,£*,+ 


+ si 5, 


jj 


£ £ 


A. NOTE ON THE PROBABILITY OF ARBITRARY EVENTS 

Br Hilda Geiringibr 1 
Bryn Mam College 

In a recently published paper [1] on arbitrary events the author studies the 
probability of the occurrence of at least m among n events. Denoting by 
p„(yi , y 2 , • • • 7 r ) the probability that at least m among the r events, E rx , 

■ • • Ey r occur, and by ?{«,.«»,...«,) the probability of the non occurrence of the 
events numbered m , a s , • • ■ a r and of the occurrence of the ft — r others, he 
pioves 

-Pl(a r +11 ••'«») + E Pl(n i «r+1 > * * * atn) ~ £ £ pdjl , Yl , «M I i ‘ ' «n) 

(I) Tl Tl 

+ *■• + (-l) r £ pl(l, ft) ** Pin,... .,t . 

(Theorem VI, page 336). From (I) he deduces that a necessary and sufficient 
condition for the existence of a system of events Ei, < • • E n associated with 
given values ti (m , • • • a*) is that the expressions on the leftside of (I) computed 
from these t’s are ^ 0 for all possible combinations of the «'« (Theorem VII). 
He also points out that it was not possible to find similar (necessary and suf¬ 
ficient) conditions for m^l. I wish to show in this note the relation between 
these theorems and some well known basic facts of the theory of arbitrarily 
linked events and to add some remarks. 

1. Given n chance variables x, (i — 1, • • * n ) denote by x< = 1 the "oc¬ 
currence of E y ", by a;, = 0 its non occurrence and by v(x t , , • • • ® n ) the 

probability of "the result , % , • • > x„)" i.e., the probability that the first 
variable equals Xi the second x 2 , • • • the last x„ ; e.g. p(l, 1, 1, 0, ••*()) * 
e H 5 ... n | is the probability that only the three first events occur. Hence the 
v’s are 2 n probabilities, arbitrary except for the condition to have the sum 1. 

Instead of these v’s we often introduce another set of 2" — 1 probabilities, 
namely p, the probability of the occurrence of i?, (t = I, ■ n); p< { that of 
the joint occurrence of E t and E, (i, j = 1, •••»); • ■ ■ p n _., n the probability 
that all the events occur. 


1 Research under a graut-in-aid of tho American Philosophical Society. 



ARBITRARY EVENTS 


239 


It may be noted that instead of the p, , p, s , p 12 ...„ we could quite as 
well use a system of g,, g,y, ■ ■ • g 12 . n where g, is the probability of the non¬ 
occurrence of E, (or of the occurrence of E\ = E - E ,, q xj that of the joint 
non-occurrence of E, and E, (of the occurrence of E^) and g, 2 . the probability 
of ■■■ E n . 

Ihe use of the p’s (or q’s) instead of the “elementary probabilities” v is 
justified by the fact that the p’s are (2” — 1) independent linear combinations 
of the v’s and that therefore the v‘s and the p’s (or the v’s and the q’s) determine 
each other uniquely. There exist in fact the following well known relations, 
(1) and (2). The first set (1) gives just the definition of the 2 n - 1 probabilities 
p, in terms of the v’s, and the second set expresses the v’s by the p’s as the result 
of the solution of the 2 n — l independent linear equations (1). Thus we have, 
beginning with pi 2 ... n . 

Pl2...„ = u(l, 1, ■ • • 1), 


Pn. ■ n— -1 = Z w(l, 1 , ■ • • 1, x n ), 


*11 

Pl2 

= Z 

xL^ “Cs , Xi j 3-*n )) 


*3 


Pn 

= z- 
*1 

' • Z vfri, Xi, X n _! , 1), 

*»-l 


and solving .successively: 

U(l, 1, ’ • • 1) = Pi2...„ 

^(1| 1> 1, 0) — Pl2< • n-1 P12. .« i 


( 2 ) 


i’(l, 1, 0, • • • 0) — p t i — Z Pu-n + 2 X) PUT113 

11 71 73 


• _• • d= Pi2 n , 


t'(0, 0, • ' * 0, 1) = p„ — Xj Put. + EH Pimn - • • • 

71 71 75 

^ Z X/ Pll72" In-1'1 ^ Pl2 • II • 

71 In—1 

The successive solution of the system (1) with respect to the “unknowns” v 
is possible because each new equation in (I) contains exactly one new unknown w; 
o.g. in the equation defining p a the only “unknown” is u(l, 1, 0, 0, ■ • 0) all 
the v’s with more than two “l”s having already been computed from the fore¬ 
going equations. 

If we choose to use the system of the g’s we have in the same way: 
qn = «(0, 0, • • ■ 0), 

= ^ . * * ‘ E v(xi , X 2 , * * • X »_;\ , 0), 

*i *n-i 


(10 


7, 








240 


HILDA GEIIUNGER 


and the inverse system 

i 1 (0, 0, ■ • 0) = <712 ■» 

^2^ 11 W, 0, ■ • • 0, 1) = qn .n-t ~ <?i2 • 

e(l, 1) ' ‘ ' lj 0) = ?n 2 2 ti b "1” 2^ 23 ffrira" * ‘ ffu-■ ■** ■ 

71 71 72 

Coming back to Chung’s theorem we see that the probability pi(at, • ■ • a r ) 
that at least one event among E ai , • * ■ S«, occurs is evidently: 

(3) pi(m, ••• ai) = 1 ?«i,.’iii t ' 

7/ we introduce this value in (I) all the “l"s introduced by (3) cancel and we 
gel our system (2'). (Of course we could in the same way deduce from (2) a 
system of equations for <p(ai, • • ■ af) = 1 — p a{ .« P where < 7 i(cu , ••• a r is 
the probability that at most r — 1 events among the r given ones occur.) 

As the w-values on the left side of (2') arc 2" — 1 independent probabilities 
only subject to the restriction that they have the sum g 1, we see that the 
expressions on the right side of (2') must have the same properties', and these 
properties are also sufficient for a system of r/s, (or for a system of 7 n(«i , • ■ • 
a,); indeed if they are fulfilled, these 2” — 1 expressions define by means of 
(2') u system of elementary probabilities e(xi, x„). Hence the theorems 

VI and VII quoted at the beginning of this note are rather close consequences 
of the basic relations of tile theory of arbitrary events. 

2. Remark 1. We may add one more equation to equations (1), namely 

1 = 23 23••• *»). 

*1 *» 

thus introducing y(0, 0, ••• 0). Then in system (2) the corresponding new 
equation will be 

*(0, o, ■ • • o) = i - 53 Vn + 2313 p 7lT1 - • ■ > ± pu. .„ 

71 7l 73 

(and analogously for the q’s). In tins way we get two systems (I) and (2) 
each consisting of 2” equations and in (2) the sum of the expressions on the right 
side is now identically equal to one. Hence necessary and sufficient conditions 
will now be that all these 2" expressions must he non-negative. 

Remark 2. It is convenient to interpret or prove results of the kind con¬ 
sidered here in terms of elementary measure theory: pi is the measure of a set 
Ei ; p 2 of Ei ; p n that of the intersection E\Ei etc., and analogously for the 
v's: e.g. t>(l, 1, 0, 0) = m(EiE t Ei Ii' n ). Consider now the equations 

(2). The first is an identity. In the second pn,,,„-.\ measures the product of 
EiE t ■ • • E„- 1 , whereas p 12 . .„_i — pn,,. n is the measure of that part of this 
pioduct which docs not belong to E„, and it therefore equals m(EiEi ■ • ■ 
En-iE'„) = v(l, 1 •' • 1, 0). In the last equation (2) £ • • ■ 23 V-n. is the 

71 7*-i 




ARBITRARY EVENTS 


241 


measure of that part of E n which belongs to at least (n — 2) other sets (besides 
E„)) whereas this same value minus p 22 , » is the measure of the part of E n 
which belongs exactly to (n — 2) other sets; now subtracting this expression 
from 23 • • • 23 Pyi m -3 we £ e t the measure of the part of E n which belongs 

71 Tn-l 

exactly to ( n — 3) other sets and finally p n — ■ ■ • -f- n is the measure of 
that part of E n which belongs to no other set besides, i.e m{E[E 2 • • • E' n -iE n ) = 
v(0, 0, ■ ■ • 0, 1). This kind of proof does not require the solution of (1). 

Remark 3. According to (1) the p,, p,,, ■ • • p I2 are the ordinary mo¬ 
ments of order 1, 2 ■ ■ • n of v{x \, x 2 , ••£„). There are of course many 

more than 2” — 1 moments of this n-variate distribution but only 2" — 1 of 
them are different from each other because l r = 1 

3. Denote by p„(x), (* = 0, 1, • • • n) the probability of getting exactly x suc¬ 
cesses in n trials. (See e g. [2], [3].) For the simplest case of arbitrary events, 

the Bernoulli problem, p„(x) = o — p) n *. Then the probabiliy of 

at least x successes (of a number of successes £ x ) is 


(4) 


Tn(z) = p n {x ) + p„(x + 1) + ■ • • p„(n), 


or p,(l, 2, ••• n) in Chung’s notation The p„(x) are by their definition 
(n + 1) arbitrary positive numbers with sum equal to one. These are the only 
necessary and sufficient restrictions for p„(x). V„(x) the "cumulative” distri¬ 

bution of p n (x ) which is defined for x between (— «> and -b°o) is a monotone 
non-increasing step function with its (n + 1) steps at x = 0, 1, 2, • • ■ n equal 
to the p n (x). 

Consider next p x (ai , a 2 , • • • a r ) where r < n; these are cumulative dis¬ 


tributions each corresponding to one of the 



probabilities p r (x ) where p r (x) 


is the probability of exactly x successes in a group of r trials. 2 For each group 
(on , • ■ * ar) the corresponding p r (x), (x = 0, 1, ■ ■ ■ r) are positive and with 

0 _ r 

sum equal to one. Hence if we always omit p r (0) because of 23 Pr(x) = 1, 


x 


all the different pi(x), p 2 (x), ■ ■ ■ p n (x) together define 


l-w + n(n - 1) + Q (» - 2) + ■ • • + n = n2" _1 

values. As «2 n ~ 1 > 2” for n > 2 we realise that between these n2 n_1 prob¬ 
abilities there must exist a set of n2"~ 1 — (2" — 1) identical relations ; and the 
same is true for the corresponding cumulative distributions V T (x) or p x {a x , 
• • ■ a r ). Thus it seems reasonable that it may be hard to use these p*(oti, 
■ • • a r ) in the characterization of a problem of arbitrarily linked events if 
x > 1. On the other hand we have seen in 1 that for x = 1 they reduce to the 


’One may write here p,(x) instead of ?>(»!,a S ," ,a r )(x) 



242 


HILDA GEIRINGER. 


2 " - 1 probabilities g ,, q ,,, • - • g 12 n which of course define the system of 
events unequivocally. 

4. Introduce in the usual way the sums of the p ,, p,/, etc. 

(5) Si — y! pt , S» = y 1 p,,, ■ * * Sn “ pis...»j and >fici 1 • 

* I./ 

Now add in system (1) fiist the n equations which define j>\ , p 2 , ■ • • p„ , 

then the equations for the p, } , etc Observing that p„(x) is (ho sum of 

all these elementary probabilities v(xi x* • • ■ #„) with exactly x “Id's and (a — x) 
“ 0 ”s we get as the result of these n additions the well known formulae: 

(0) S y = 22 p n (x), (7 = 0, 1 , • - • n). 

n 

Here 7 = 0 gives So = 1 = 22 Pn{x). We may solve successively these (n + 1) 

0 

linear equations with respect to p„(n), p„(n. — 1 ), • • • p„( 0 ), each linear equa¬ 
tion containing only one new unknown, and find: 

(7) vm = i (- ir* Q Sy , (x = 0,1, • • • h). 

(These formulae could also have been derived from (2) by collecting groups of 
equations such that all the corresponding e(xi , • • ■ x„) contain the same 
number of “T’s.) (In the measure interpretation p n (x) is the measure of that 
part which belongs exactly to x of the original sets and iS’ T measures the set which 
belongs to at least 7 of these sets.) We also find by “cumulating" equations (,<)) 

( 8 ) Sy = 22 V„(x), (7 = 1 , 2 ,..- n), 

and the inverse system 

(9) V n (x) = t (-1r r (J 22 J) s 7 , (X « 1, 2, .. - «). 

(C) and (8) are of the same type as (1), and (7) and (9) of the same as (2). We 
also may deduce analogous formulae by interchanging the roles of 0 and 1 and 
introducing a system of J\ , 1\ , • ■ ■ T n which depends on the g's in the same 
way as the Si, S 2 , ■ ■ • S„ defined in (5) depend on the p’s. 

Wo have seen that the p„(x) arc (n +• 1) arbitrary non-negative numbers 
subject to only the condition of having their sum equal to one. But the S 7 
(7 = 0, • - - n) are not arbitrary as wc see from (7). The (a + ]) expressions on 
the right side of (7) must each be non-negahve if they are to define the probabilities 
pJx) (their sum is identically equal to one). Then and only then they define 
a system of arbitrarily liked events E t , • ■ • E n . 

The p n (x), (x = 0, 1, ■ n) are of course not equivalent to the complete 



ARBITRARY EVENTS 


243 


system of 2 n — 1 values u(a: 1 , x 2 , • • • sc„) and the same remark holds for the 
Sq , • ■ ■ S n and the system of p,, p, s , ■ • • p J2 . But often, we are par¬ 
ticularly inteiested in problems dealing only with the p n (x) (and S-,). (For 
instance the author has studied [2] the asymptotic behavior of p n (x) as n tends 
in different ways towards infinity.) The simplest way to indicate a particular 
p-system corresponding to given S y is of course to assume all the p, equal to 
each other, all the p u equal to each other etc. and to put therefore: 

Pi = Pi • ■ ■ = Pn = - jSi , 
n 



In the corresponding u-system all these v's which show the same number of 
“l”s equal each other. 

We see from ( 6 ) that the S y (multiplied by 7 !) are the factorial moments of 
order 0 , 1 , • ■ n of the distribution p n (x). Therefore by (7) we get the p n (x) 
in terms of their factorial moments up to order n. We may therefore also say: 
Necessary and sufficient conditions that a system of numbers No = 1, Ni, • • • N n 
be the factorial moment of an arithmetical distribution with at most (n + 1) steps 
at x = 0, 1, • • * n are the inequalities. 

* ■« ('—1')?+* 

( 10 ) £ r f 7 ~ r,y T SO, (* = 0 , 1 , • • • n). 

Note that here there is no more allusion to a set of arbitrary events; ( 10 ) are 
the necessary and sufficient conditions for a set of in + 1 ) numbers to be the 
(n + 1 ) (factorial) moments of an arbitrary arithmetic distribution with its 
abscissae given. The linear inequalities (10) differ very much from the basjc 
inequalities in the classical problem of moments; because in our problem the 
abscissae of the steps are given in advance. 

6 . In some problems (e.g., some questions connected with the law of large 
numbers, with correlation theory, with analysis of variance) we are only con¬ 
cerned with the first and second moment of a distribution. Thus we are lead 
to the following question: Given r + 1 numbers No, Nj, ■ ■ ■ N r , (r g n) 
indicate a set of necessary and sufficient conditions such that these numbers 
are the moments of an arithmetic distribution with at most (n + 1 ) steps, at 
0, 1, 2, • w . 5 Some sort of an answer which may work well in particular 

cases, can immediately be deduced from (10). “r + 1 numbers No, Ni, 
• •‘N r will be the factorial moments of an arithmetic distribution with, at 
most, (n + 1 ) steps at 0 , 1 , 2 , ■ ■ • n if and only if it is possible to indicate s 

• This problem and the method of its solution has much in common with a problem 
studied in R, von Mises’ paper [4], 



244 


HILDA GEIRINGER 


numbers A/V+i, • • ■ jV 1+1 , (0 i s £ n — r), such that for the r + s +■ 1 num¬ 
bers No, Ni, •' • N r , N r +, the r + s + 1 inequalities 

E Z7tr~r^y 0 (■* - 0. 1, r + a) 

7 »X xl{y - x)! 

be satisfied.” 

The proof of this statement is self evident but the statement itself cannot 
be considered satisfactory We get a general solution in the following way. 

Let /,(!), • • • f r (t) be r functions of the chance variable t, v(t) an arithmetic 
probability with n given attributes h , h , • • - t n and 

(11) E[fM ~ E/ P 0>(0 - E ar„Py - S„ (p = 1, 2, • • • !•), 

1 7-1 

the expectations of f p (i) with respect to v(t). We wish to indicate necessary 
and sufficient conditions for the r numbers S p . For f p {t) = f we have the prob¬ 
lem stated above where the first r moments are given. 

Call ( S ) the r-dunensional curve x p — f p (t) and Pi, Pi , • ■ • P„ the points 
on (S) with coordinates f p (t y ) = a 1p , (p = 1, • ■ • r; y = 1, • • • n), S the given 
point with coordinates S„ . In this case, the -point S must be contained in the 
smallest convex body {B) determined by the n points Pi, • ■ ■ P n . This condition 
is necessary and sufficient. Because, if wc interpret the v-, which are 0 as 
masses of the points P T , with sum equal to one, then $ is the center of gravity 
of these masses and it is well known that the above mentioned condition for S 
has to be fulfilled. But this condition is also sufficient, because if S is contained 
in (B) there exists always a simplex of at most r dimensions, consisting of at 
most (r + 1) of the given points such that S is the center of gravity of appro¬ 
priate masses in these points. 

If we want to indicate explicitly the inequalities for the S p we must know 
the boundary of (B ), This is determined by its planes of support (“StUtz- 
ebenen,” Minkowski) sometimes called tack planes. A tack plane is a plane 
which does not separate any two points of the given point set and contains at 
least one point of this set. A plane is said to separate two points if, when the 
coordinates of the points are written in the equation of the plane two values 
with opposite signs result. These definitions enable us to find those points P y 
which lie on the boundary of ( B ) and to determine this boundary. (E.g. for 
r = 3 we have to find such triples of i, k, l, that the determinant which represents 
the equation of the plane through these three points lias the same sign for all 
possible other points P\ . If the S p are the first three moments with respect to 
the origin, these determinants become Vandermond determinants and we find 
easily that the boundary planes are each passing through two neighboring 
points P r , P y+l and one of the endpoints Pi or P n . If p = 2, and the first 
two moments arc given, the boundary of ( B ) consists of the polygon PiP% < • • 
P n P i) Then we find without difficulty the conditions to be satisfied by 3 in 
the form of linear inequalities between the given Si, Si, • • ■ S r . 

We get the continuous case as a limit of the discontinuous case as t y —> t y+1 



AN INEQUALITY FOR MILL’S RATIO 


245 


and the points P(f) take up the whole curve (C), eg between i = 0 and °°. 
Then the relations between the given S p become non-hnear inequalities, well 
known for the problem of moments. 

REFERENCES 

[1] Kvi Lai Chung, “On the piobabilitv of the occurrence of at least m events among n 

aibitrary events,” Annals of Math. Slat., Vol 12 (1941J. 

[2] Hilda Geiringer, "Sui lea vanablcs aldatoires arbitrairement liees,” Revue Interbal- 

camque, 1938. 

[3] Hilda Geihingeh, "Bemerkung zur Wahrschcinhchkeit nicht unabhangigci Ereignisse," 

Revue Interbalcanique , 1939 

[4] R. v Miser, “The limits of a distiibution function if two expected values aic given,” 

Annals of Math Stat , Vol 10, 1939 


AN INEQUALITY FOR MILL'S RATIO 

By Z. W. Birnbaum 
University of Washington 

Mr. R. D, Gordon 1 recently proved the inequalities 
x 


1 _ c -** 2 ^ i_r 


e dt < i- —e ixi , for x > 0. 
* V2v 


* a + 1 \Z2tt V2 v • 

In the present note we show that the lower inequality can be replaced by the 
better estimate 


s/ i + x 1 — x 1 


L e -^ < _J_ r 

oZ ~ ^/oZ L 




2 ■%/ 2t y/ 2ir 


dt. 


Proof: According to a well-known theorem of Jensen 2 , for /(<) convex and 
g{l) Si 0 in the interval (a, b), the following inequality holds 


/ 


J tg(t ) di /J a ff(0 < £ f(t) g(t) dt IJ g(t) dt. 

For a = x,b = w ,/(f) = 1/t, g{t) = te~ ,Vi , this inequality gives 

f dt / [ (V“’ dt < f e~ !,a dt/[ «T»' J dt. 


Since 


J te i|J dt — e~ 


and 


J t ! e i,a dt — xe !l5 + J e~ ),a dt, 


1 R D. Gordon, "Values of Mill's ratio of area to bounding ordinate of the normal 
probability integral for large values of the argument,” Annals of Math. Stat , Vol. 12 (1941), 
pp 364-366. 

! See for example. G. H. Hardy, J E Littlewood and G. P61ya, Inequalities, Cambridge, 
1934, p 150-151. 



246 


Z. W. BIRNBAUM 


we find 


and hence 













ADDITIVE PARTITION FUNCTIONS AND A CLASS OF STATISTICAL 

HYPOTHESES 

By J. Wolfowitz 

New York Cily 

1. Introduction. The purpose of the first part of this paper is to prove 
several theorems about a class of functions of partitions which are additive in 
structure and subject to mild restrictions These theorems may be regarded as 
contributions to the theory of numbers, but if one makes certain assignments 
of probabilities to the partitions the theorems may be expressed as statements 
about asymptotic distributions It is in this latter, probabilistic language, that 
we shall carry out the proofs, for the following reasons. The discussion will be 
more concise and certain circumlocutions will be avoided The theorems have 
statistical application and a number of theorems discussed recently in statistical 
literature are coiollaries of one of our theorems. 

In the second part of this paper the theory of testing statistical hypotheses 
where the form of the distribution functions is totally unknown and only con¬ 
tinuity is assumed, will be discussed. The exact extension of the likelihood 
ratio criterion to this case will be given. Approximations to the application of 
this criterion in two problems will be proposed, one of which applies the results 
mentioned above. Lastly, in connection with the second problem, a combina¬ 
torial problem will be solved which is new and has interest per se. 

2. Partitions of a single integer. Let n be a positive integer and A = 

(tti, a 2 , • • ■ , a.) be any sequence of positive integers a; (i = 1, 2, • ,s), 

8 

whcie 52 a, = n, and s may be any integei from 1 to n. Two sequences A 

t-i 

which have different elements or the same elements arranged in different order 
are to be considered distinct, so it is easy to see that there are 2" 1 sequences A. 
We shall consider the sequence A as a stochastic variable and assign to all 
sequences A the same probability, which is therefore 2' n+1 . Let r, be the 
number of elements a in A which equal j (j = 1, 2, • • , n), so that r j is a 
stochastic variable. Let k be an integer < n. Then the joint distribution of 
the stochastic variables r\ , r 2 , • • • , r* is given as follows: The probability that 
r* - b ( (» = 1, 2, ■ ■ • , Jfc) is 


where the inner summation is carried out over all sets of non-negative integers 
r (Je+ p , • ■ • , r n such that 

(2.2) b\ + bi + ■ • • + bk + r (fc+n + " ' + r n = r > 

(2.3) 6i + 2bi + • • • + kb k + (fc + l)r(M-n + ■ • • + nr„ = n 

247 


( 2 . 1 ) 


( Y 


Kk)! 


r! 


(&*) 1 (rffc+ij). 


— ( rn ) l) 



248 


i. wni.ru win 


(The b, , r»f rour-c, art* imn-negatiw* integers. I 

*1 

Let r *- 53 r,, and 

rjj.i, »- 53 r * i {k < rj>, 

l**M 1 

w that r and rl*t arc both -Uorhastic variables, The* probability flint at the 
same time 

(2.4) r, ■* b, , li * 1, ■ • ■ , k), 

and 

(2.5) r a t-i) ” h,i, , . 


is given by (2,1) with the restriction 

(2.(5) + •• • + r„ » /><<.*»< > 

added to the restrictions (2.2) arid (2.3). With this added restriction the 

* 

summation in (2.1) may he performed us follows: Let / «= 53 ib, ■ It is* easy 

to see that the number of sequences A where every a, > k, r ~ r(* m = f'u+n , 
and 2a, ® n — (, is given by the coefficient of .r” 1 in the purely formal expan¬ 
sion in x of 


(x* H + ** + * 4 - x* +! + 


. . . )*!»*!) 






/ 1 V"**-!) 

\1 ~ x) 


and is 


/n — t — — 1 

\ h(* + D — 1 

Hence P ((2.4) and (2.5) (, where this symbol will always denote the probability 
of the relation in braces, is seen to be 


(2.7) 


(53 h,- + b(k+»^!/n 


(b(/t+i))l IT ((>,)! 

i-i 


— I — — 1\ 

hi*4-i) “1 ) 


If X is a stochastic variable, let E(X) and ^(X) denote, respectively, the 
mean and variance of X (if they exist), and if Y is another stochastic variable, 


let cr(XY) be the covariance between X and Y. Also let X 


X - E(X) 

*(X) 


By the distribution of X we shall mean a function <p(a;) such that P{X < :c} ™ 
<p(x), These conventions being established, we seek first to evaluate E(r<). 
This may be done by differentiating with respect to y the-coefficient of x" in the 



ADDITIVE PARTITION FUNCTIONS 


24 D 


purely formal expansion in x of 2 "‘"(a; + x 2 + • ■ ■ + z' _1 + yx* + z‘ +1 + • • ) r , 
setting y = 1 and summing over all values of r. We have therefore to evaluate 



which is easily seen to give us the result 

( 2 ' 8 ) E(r,) = (n - i + 3)2~‘ _l , (i < n), 

while it is obvious that 


(2.9) E(r n ) = 2~ n+1 . 

By use of similar devices the variances and covariances of the r, may also be 
obtained. We omit the details of those calculations and also the presentation 
of the covariances, since the latter are not necessary for the proof of Theorem 2. 
The results are: 


(2.10) = » ( 2 t + 


1,3 i 2 - 12 i + 5 
’’ 2 2x+2 


)• 


(i < in). 


The limitation on the value of i is necessary because the processes for Bumming 
binomial coefficients with the aid of the device described above are no longer 
applicable. The matter is easily settled, however, for if X is a stochastic 
variable which can take only the values 0 or 1, then 

AX) = E(X) - MX)l 


The ri for i > in are such variables, so that 


( 2 . 11 ) 

An) = 

( 2 . 12 ) 

An) = 


n — i + 3 (n — i + 3) : 

2‘+i 2 2,+2 

(2"~ l - 1) 

2 2 "-2 


(n> i> in), 


Also without difficulty we have 


(2.13) 


Arm) 


n + 6 . (n + 6) 2 . 1 

2 «"+*) 2 n+ * ’ _ 2 " -2 ’ 


when n is even and > 2, and 

(2.14) E(r) = *(» + 1), 

(2.15) Ar) = 1(» ~ 1). 

Finally, 

(2.16) E(r[k+ 1 )) = (n — k + 1)2 k *. 

The next results we shall need may be expressed in the following: 

Theorem 1: As n approaches infinity, the joint distribution of the stochastic 



250 


J. WOLFOWITZ 


variables fi, ■ • •, f*, f(*+n (A o?iy jtred positive integer ), approaches the multi¬ 
variate normal distribution, 

This theorem is proved as follows: Make the substitutions 


a',' 


r, — n- 2~ 

V n 


aqjt+n = 


r<jt+i) ~ n-2 x 1 
■\/ n 


(i ~ 1,2, • , &), 


in tlie expression 


fS + r (A+ i)\ / n — t — kr[it + i) — 1 N 

1-n+l \i-l / ( 


(rjju-i))! II fa) I 

l-l 


F(*+1) ~ 1 


A 

which comes from (2.7), and regard 1 as equal to X) ir, , Replace the various 

1 — 1 

factorials by their asymptotic approximations ns given by Stirling’s formula and 
simplify the resulting expression. The subsequent procedure is simple but 
laborious and we omit the details, which arc like those of the classical proof of 
De Moivrc’s theorem as given, for example, in Freehet [1], p. 80. 

We now prove the following theorem on additive partition functions: 
Theorem 2: Let f(x) be a function defined for all positive integral values of x 
which fulfills the following conditions : 

(a). There exists a pair of positive integers , a and b, such that 


(2.17) 

(b). the series 


/(<*) ^ a 

f(b) b ’ 


(2.18) 


t l/(t)|2-*\ 


converges. Let F(-4), a function of the stochastic sequence A, be defined as follows: 

(2-19) F(A) = £/(<»,). 

1-1 * 


Then for any real y the probability of the inequality F(A) < y, approaches 


_j_ r 

V^T J-a 




dy, 


as it—> oo , 

We restate this theorem without use of probabilistic terms: 

Let A be any sequence of positive integers whose sum is a given integer n, 
Consider two sequences A to be different if they contain different elements or 



ADDITFVE PARTITION FUNCTIONS 


251 


the same elements arranged in a different order Let f(x) and F(A) be defined 
as above, with the aforementioned restrictions. Then there exist, for every 
positive integer n, two numbers E n and a n , such that 2 - " +1 multiplied by the 
number of sequences A for which the inequality 


holds, approaches 


F(A) - E n < ya n , 


1 

\/ 2ir 



Ay, 


as n —» . 

For convenience, the proof will be divided into a number of lemmas. 

If <p(y) is any continuous distribution function, then it is well known that <p{y) 
is uniformly continuous and that consequently, for any arbitrarily small, posi¬ 
tive t, there exist two positive numbers, h and D, with the following properties 

(a) . If i/i and y 2 are any real numbers such that | y\ — yi \ < h, then 

I viyi) ~ v{yi) I < «, 

(b) If y is such that | y \ > D, then <p(| y |) > 1 — «, and <p(— ( y |) < « 

We now first prove 

Lemma I Let X and Y be two stochastic variables, both of which possess finite 
means and variances Suppose that there exists a continuous distribution function 
ip{y) and two small positive numbers e and 5 (say « < 1/10, 5 < 1/10), such that 


( 2 . 20 ) 

for all y, and 
( 2 . 21 ) 


irn <v\ - <p(y) I < 

«(Y) _ 

c{X) ■ 


Let h and D be chosen as above for <p(y), with the additional proviso that h < § 
and D > 1 Suppose further that 

( 2 . 22 ) 

Then 

(2 23) | P((XTT) < V) ~ I < 3«, 

for all y. 


Proof. We have 

u\X -f Y) = <r\X) + 2 #(XY) + <r 2 (F). 
Since, as is well known, 


| «r(XY) 1 < a(X)a(Y), 



252 


J. WOT-iFCWITZ 


it follows from (2.21) that 

(2.24) j(X + Y) = (1 + 6')t(X), 

whcip j S' | < 5 Hence 


(2.25) 


From Tehebyeheff's inequality and (2,21) it then follows that, if d — h/ 4, 
(2.26) P ■ 


, a(X + Y), j d >' 


and 

(2.27) 

Now 


46 * 

IP 


<t<t. 




j 

> d 


(2.28) 


fx-m-) ir-Axy) 

W + Y) J ’ j c(x + y) 

<P\(X+Y) < y| + < 

- p{(x + i-, < r, 1+ yj! £ d 

+ e{ur-+y)< I(! jf ( -«»;>tf} + 

' x -*S<» + 4 + *. 


< Z^ 

Mx + y) 

Hence, from (2,24) 

P{X <(y- d)(l + 6 ')] - e 

< / J |(F+y) < y\ < p\Z < (y + <0(1 + 6')) + « 
and consequently, from (2.20) 

<p(y — d + 1 / 6 ' — <Z 5 ') — 2t 

< P{(T+~Y) <y] <lfi (y + d + yf + dV) + 2 «. 
Now if | y | < 2D, then from (2.22) 


(2.29) 


(2.30) 


d + \yS'\+d\S l \<j + l + ^ = h, 
4 2 4 



ADDITIVE PARTITION FUNCTIONS 


253 


and if | y \ > 2D, then also from (2 22) 

\y\-d-\y8'\-d\5'\>\y\(l- 5)-^>^\y\>?D. 

Recalling the definitions of h and Z), it follows from (2.30) that, for all y, 
(231) <p(y) — 3« < P{(XTT) < V] < v(y) + 36. 

This proves Lemma 1 

Lemma 2: For any fixed pair a, b, of positive integers such that a < b, 

(2 32) 


Urn = 1 


[E(r.)P 


Proof From (2.8), for fixed i 


- E(r t ) 
n 


and from (2.14) — E(r) —> \ as n —>• . The required result follows easily. 

n 

For any n we now define 

B{k,n) = £>,[/(,)], 

>-i 

and 

C(k;n) = it, r,[f(i)]. 

Then 

F(A) = B{k-,n) + C{k,n). 

Lemma 3 : For any real y and any fixed positive integral k the probability that 
the stochastic variable B{k;n ) shall fulfill the inequality B(k,n) < y approaches 

C 

Proof. By Theorem 1, the stochastic variables h , h , • ■ ■ , ft, f(i+ip are 
asymptotically jointly normally distributed. As an immediate consequence so 
are the variables fi,h, • • ■ , f L , and hence B(/c;n), which is a linear function 
with constant coefficients/^),/(2), • • /(fc), of n , r 2 , ■ • ■ ,r k , is asymptotically 
normally distributed 

Lemma 4. There exists a constant c > 0, such that, for all n sufficiently large, 

(2.33) <r\F(A)) > cn. 

Proof. For any sufficiently large, arbitrary, but fixed n, we will construct 
two sets, Si and Si , of sequences A, with the following properties Si and S 2 
have the same probability p, with p always greater than (3, a fixed positive 



254 


J, WOLFOWITZ 


constant which does not depend on n. Since the probabilities of Si and S-j are 
equal, each possesses the same number of sequences A. Between the member 
sequences of the sets and Si we will establish a one-to-one correspondence 
such that, if At is a member of Si and Ai is its corresponding sequence in Si, 
then 

(2.34) | F(A0 - F(Ai) | > 2dVn, 

where d is a fixed positive constant which does not depend on n. 

It is easy to see that such a construction would prove the lemma. The 
probability of any sequence A is 2“ B+I . Hence the contribution of a corre¬ 
sponding pair Ai and At to the variance of I<\A) is by (2,34) not less than 
2“ n+2 d 5 n and the contribution of the sets and iS'j is not less than 2/3 d 2 n. 

It remains then to carry out the construction of Si and St . For the sake of 
simplicity in notation, we shall cany out the construction with the assumption 
that the integers a and b of (2,17) are I and 2. It will be readily apparent, 
however, that the proof is perfectly general and with trivial changes holds for 
any pair a, b. This lemma is the only place where the hypothesis (2.17) is used. 
The latter condition is necessary because, if for every pair of positive integers 
i and j, 

/(*') = * 

/(i) i ' 

then F(A) is a constant multiple of n, for n — £ ir, and then 
F(A) = £/(a,) = £ r,f(i) =/( 1) £ ir, = n/(l). 

» t I 

Each sequence A uniquely determines the “coordinate” complex 

h , n , • • • , r„] 

which we prefer to write as the pair L = (l, V ): 

l = !n , ri J, 

V = (r 3) U, ••• , r„|. 

To each pair {l, l 1 ) there correspond in general many sequences A whose exact 
number may be explicitly given in terms of factorials. The totality of all A 
whose L have the same second member l 1 will be called the group determined 
by l', or just the group l’. The subset of a group L 1 all of whose A have the 
same i\ will bo called the family (V, r,). All the A in the same family have the 
same L. For V and n determine r 2 through the equation £ ir, - n. 

I 

According to Theorem 1 for k = 2 , Ti, n, rj are asymptotically jointly 
normally distributed. Let 



ADDITIVE PARTITION FUNCTIONS 


255 


The limiting variances of r 2 and r' 3 are constant multiples of no\ . Therefore 
the set H of all A whose L satisfy the constraints 


(2.35) 


n . . n , /- 

< n < - + yn <ri 

4 4 

n n . /- 

g<?2<g+\/Wl 


n / n . /- 

g < ^ < g + V»«n 


has, by virtue of the fact that the limiting correlation coefficients of the variables 
ri , Ti , r 3 are all less than 1 m absolute value, a positive probability, which 
exceeds a fixed positive constant y for sufficiently large n If any member 
sequence A of a family is in H, the entire family is obviously in H. Any se¬ 
quence A belongs to one and only one family. Hence the set H may be decom¬ 


posed in a disjunct way into entire families, 
in H, where of course 0 < hi < s/n m . 


Let ( V, t + be any family 
Consider the (second) family 




^ l ^ + 2 \/ntri + h i^. This family is not in H. We now wish to show that 

the probability of the second family exceeds c' times the probability of the first 
family, where c' is a fixed positive constant which does not depend on either n 
or the particular families in question 
For the first family, let 


n 


= l + h, 


' 71 i j, 

= g + h 3 , 


U = g + hi, r = - + hi + hi + hi. 

Hence 

(2 36) 0 < h t < Vn<n (i = 1, 2, 3). 

For the second family we therefore have, since both families are in the same 
group, 


n = | + 2\Zn<n -f- h\, 
rt = | — y/ntn -f hi, 

ri = g + h, 

t = - + -\/n ci -f- lii + hi + hz . 



256 


j. wolfowitz 


The ratio of the probability of the second family to that of the first family 
equals the ratio of the number of sequences A in the second family to the 
number of sequences A in the first family. By elementary combinatorics, since 
both families are in the same group, the latter latio is 


(2.37) 


+ s/na\ 4 hi 4- hi + ^ 4- hi'j ! -j- ! 


~ + hi + h% -f- /i 3 ^! 


and hence exceeds 


(2.38) 


ft 


4 hi 4 ha + hj 


(r + Wn„ + (2 - Vnn + . 


At this point, if we had been using the numbers a and h of (2.17), we would 
make use of Lemma 2. In the present ease the result of that lemma is trivial. 
It is easy to see, therefore, that (2.38) equals 


1 4 


2 hi 4 21t s 4 2 hi 


(2.39) 




X^14 

which, in view of (2.36), exceeds 


8\Zn<n 4 4/ii^~ 2 V"^ _ 8\/ncri — 8ht s y / ' nct 


(2.40) 


1 L 2 *lY V "'‘ ,(l — Seri \ », 

Vn / \ y/n) 


which, in turn, for sufficiently large n, exceeds 
(2 41) = C '. 

We are now ready to construct Si and Si . Let 


h = (l\ r.) 

be any family in H and consider the family 

h — r i 4" 2\/n<ri), 

Select in any manner whatsoever c'v of the sequences .4 in /i , where v is the 
total number of sequences in /i . Call this set of sequences /*. Select in any 
manner whatsoever c'v sequences from / 2 and call this set /**. That there, exist 
at least c'v sequences in/ 2 is assured by equation (2.41). In any manner what¬ 
soever establish a one-to-one correspondence between the sequences of f* and 
/**. Suppose A! and Ai are corresponding sequences. Since/* and/** belong 
to the same group, and since/(2) ^ 2/(1), we have 



ADDITIVE PARTITION FUNCTIONS 


257 


(2 42) 1 F(Al) ~ F(Ai) 1 = l/(2)V ^ 171 ~ 2f(1 ^ n I 

= |/( 2 ) - 

so that (2.34) holds with 

(2-43) d = l|/(2) - 2/(1) | <n. 

Now r proceed in this manner for all the families /i in H. The union of all the 
sets f* is the set Si and the union of all the sets f** is the set Si It is clear 
that, since the probability of H exceeds y, the probability p of Si exceeds 
d = c'y. This proves Lemma 4 

Lemma 5. For any arbitrarily small positive number £ there exists a positive 
integer p(£), such that for any k > p(f) and all n greater than a fixed lower bound, 

(2.44) AC(k-,n)] < £n. 

Proof Since 


C(k\n) = £ rj(i), 

t-k+l 

and, as is well known, 


| tr(XY) | < cr(X)a(Y) 

we have 

(2.45) AC(k-n)}<\ £ |/(i)k(r,)T 

From (2.10) it follows readily that 

( 2 - 46 ) < \ + 2^i +( + JStt*) > 

and the quantity in parentheses in the right member of (2 46) is easily seen to 
be negative, so that, for i < \n and n > 3, 

(2.47) v(r.) < -v/271 2T*‘ 

From (2 11) and the definition of r,, it follows easily that (2.47) holds also 
when i > and n > 3 

Hence, in view of (2.12), (2.13), and the convergence oi the series in (2.18), 
the desired result follows from (2.45). 

Lemma 6. Let the £ of Lemma 5 be < \c, where c is as in Lemma 4. Then 
for k > p(£) and n larger than a fixed lower bound 

(2.48) <r\B(k}n)) > \cn 


Proof: Since 


F(A) = B(k,n) + C{k;n), 



258 


J. WOLFOlVITZ 


we have 

<t 2 (F(A)) = a-(B(k‘,n )) + <r*(C(fc;n)) + 2a(BC) 

< a-(B) + a(C) + 2 a(BMC) = («r(B) + cr(C')) 1 . 

Hence from (2.33) and (2 44) V cn < a(B) + §Vcn and the required result 
follows. 

Pit oof of the Theorem: Let t be an arbitrarily small positive number. For 
all n sufficiently large we have, by Lemma 3, 

jP(5(fc;n) < y }-- [ <T W ' dy \ < e, 

I V 2lf 1 


for all y. For a small £ to be chosen later and large enough k and n we have, 
by Lemmas 5 and 6, 


(2.49) 


*(<?(*;»)) . . 4 | 

a{B(k,n)) c 


Now let the Ay) of Lemma 1 be defined as 




dy, 


and choose h and D as in Lemma 1 for our present «. Since c is fixed and £ 
still at our disposal, choose £ sufficiently small so that the 5 of (2.49) satisfies 
(2.22) Since the hypothesis o£ Lemma 1 is satisfied, we have, from (2.23) and 
Lemma 3, for all n sufficiently large, 

|P(PG4) < y\ - Ay) | <3e 
for all y This is the requited result. 


3. Partitions of two integers. Let )ii and n 2 be positive integers, Hi + rw — n. 

Jll 72 2 

— = Ci, — = e 2 , and e = max (ei, e 2 ). Let V — (vi , i'i , ■••,«,) be any sequence 

of positive integers v, (i = 1,2, • • • , s) where ai + n a + a 6 + ■ • ■ equals either one 
of ni and n 2 , while a 2 -4- cq + a a + ■ • • equals the other. Such sequences are of 
statistical importance (cf. Wald and Wolfowitz [2]). As before, sequences V 
with different elements or with the same elements in different order will be con¬ 
sidered different and to each sequence V will he assigned the same probability, 

which is therefore easily seen to be n *' n — , 

n\ 

Let 7*i, be the number of elements equal to i in that one of the two sequences 
(ai, a 3 , a 6 , • ■) and (a 2 , a t , a 9 , • ■ •) the sum of whose elements is ?u and let 
r 2 , be the corresponding number for the other sequence. Let 



ADDITIVE PARTITION FUNCTIONS 


259 


s. = r u + r 2t , 

n = 2 ri,, n = X3 ’’2>. 


a l 

•5 = J'i + r 2 , ?"i(t+i> = Xj r ii 

t-t+i 

/ " 2 
?"2(ifc+l) = X/ r 2* ■ 

i-i+1 

The necessary computations such as are given in the beginning of the previous 
section have been performed by Mood [3] and we summarize them as follows: 

Theorem 3 (Mood) ■ As n approaches infinity while ei and e 2 remain constant, 
the joint distribution of the stochastic variables 

fll , fu , • • • , fa , f\(k+l) , f 2 l , 7*22 , • • * flk 

(where k is any fixed positive integer), approaches the multivariate normal distribu¬ 
tion. 

Mood (loc. cit) gives the following parameters, with the convention that 

(3.1) x (,) = x(x — l)(a; — 2) • • • (x — i + 1): 

t _1 1l(2>„(0 

(3.2) 


(3.3) 


1 _ («2 + l) <2> ?ll 0 
E{ru) = -^-> 

E(r u ) 


hm 

n —►oo 7h 


„i 2 

ej e 2 , 


(3.4) lim E ( ' ria+1) ) = 


e K i +1 e 2 , 


(3.5) 


(3 6) 


•*(n.) = 


nf\ni + l) lJ) »{ 20 (n 2 + (n 2 + l) w ni 


2 (2i+2) 


n (,+1) 


A (n 2 + l) (S, «i l> \ 
\ n (,+1) ) ’ 


lim 


<r 2 (n.) 


_ 2i—1 3 

ei e 2 


[(i -f- l) 2 ei e 2 — t 2 e 2 — 2ei] + el e 2 . 


The corresponding parameters for r 2 , may be obtained from the above by inter¬ 
change of ni and n 2 . Also 

.. E(n) , E(r t ) 
hm — 2 — = hm —— = eie 2 . 


(3.7) 


n 


-> 00 Ti 


For additive partition functions we have the following theorem: 

Theorem 4 Let /( x) be a function defined for all positive integral values of x 
which fulfills the following conditions: 

a) There exists a pair of positive integers, a and b, such that 

/(a) , a . 

m * b ’ 


(3.8) 



260 


J. WOLFOWITZ 


b) the series 

(3.9 ) zimok 71 

converges . Let F(V), a function of the stochastic sequence V, be defined as follows: 

(3.10) F(V) « EM) 

t-1 


Then for any real y the ■probability of the inequality P(V) < y approaches 



dy, 


as n —* oo, while e\ and ei remain constant. 

The basic idea of the proof of this theorem is the same as that of the proof 
of Theorem 2. We omit all the steps which can be written without difficulty 
by analogy to those in Theorem 2 and present only those where some major 
change is necessary. The numbering of the lemmas will correspond to that of 
Theorem 2. 

Lemma 2. For any fixed pair, a and b, of positive integers such that a < b, 
(3.ii) - i, 

as n —* oo. 

The proof is the same as before. 

The following are the definitions corresponding to those of Theorem 2: 

B(k;n) = E »<M), 

v-i 

C(k\n) = E »</(»)• 

<-*+i 

Then as before 

F(V) = B(k-,n) + C(k;n). 

Lemma-4. Statement is the same as that for Theorem 2. The following im¬ 
portant changes must be made in the proof; 

Each sequence V determines the coordinate complex 

rn , r« , • - • , n„ 

fn, r n > • • - , r u 

also 

(rn, rn 

t = , 

J"ji, rn 



ADDITIVE PARTITION FUNCTIONS 


261 


|r 23 , • • ■ , r 2n J 

The set H is the set of all V whose L satisfy the constraints 

neid\ < ru < nejel + on , 

ne\e\ < r w < nelel -+- <rn > 

nelez < r 2i < neie 2 + \/n o-n , 

nelel < r 22 < ne\el + -\/n an , 

nele 2 < r( 3 < nele 2 + \/n a n , 

where 

,• <rOil) 

(ru - lim — 

n-»oo -\/ 77, 

The repi esentative family for H is characterized by 

(l', neiel + An), 
and this family is compared with the family 

( l', ne-iel + 2\Ai crii + An). 

For the members of the family in H 

i"ii = ne ie 2 d - An = Tiffin d* An, 

ri 2 = neie 2 An = nm a d~ An , 

7*21 = 7 ieie 2 d - A 2 i — Tiffin d~ A21 , 

r 22 = neie 2 d* A 22 = nm 22 d* A 22 , 

t 13 = ne\e 2 d - An = Tiffin d* An , 

ri = neiCz d* A’ = nm d* A, 


where 


7*2 — 7*1 [ < 1 , 


(3 12) A (J < s/ n (Tn , 

(3.13) A = An d~ An /in 

And for the members of the second family 

?'ii = nmn 4* 2 -\/n an d* An , 
?'i 2 = Tiffin — "S /n o’n d* An , 


A = An -}~ An d* Aj 



262 


J. WOLFOWITZ 


rii = nmn + 2 ■%/ n an + hu + dn , 
Tii — nmn — \/n an + A 22 + On , 
r'u = nin'u + h a , 
n = nm + s/n <m + h, 

I r-i - ri | < 1, 


with 


| 0 n | < 1 , | 6 n | < 1 

To the expression (2.37) corresponds the expression (3.14), with | 6 | < 1: 


(nm n + /in ) 1 (nmn +Jm) ! (n% + hn)[ (nm?, + /i M )! 

(nm + A)! (nm + A)! 


(3.14) 


X 

X --- 


(nm + h + -\/n an) 1 

(nm n + 2 *\Jn cn + An)! (nm « — •%/n an + An)! 

(nm T A H~ "\Ai <rn -f* fl)! , 

(?w?iji + 2-\/?i ern -)- An -f- 0ji)l (nma — "v/n ff n 4* Am H~ #m) i 

which exceeds 

(nm + A)^'" X (nma + 2Vn <ni+h u r W " 

X (wh — -s/n irn + Au)^" * u 
X (nm 2 i -f- 2\/n an + An) -v '«»11 

X (nniM - Vnau 4- A^*' 11 
Employing Lemma 2, we find that (3.15) equals 

/ + ± yv.-.„ / 2y£.m + *»'p' : 

\ nmj \ nmu ) 

— s/n an + Au^ v, '‘ 


(3.15) 


(3.16) 


( 


X 1 + 


nmu 


" '1 I 


X (^1 + g n + ht A 

V n^iji / 

xfi4- ~Vnau + h n \ 

\ nmn J 

In view (3.12) and (3.13), (3.10) exceeds 

(l + 3 X5 !sy /,m X (l - v«sf" 

\ nmu J \ nmu / 


i\/ n ff| 1 

•/n m 


(3.17) 


3\/nan^ 2%/ '‘ 


X (l + x fi_ 

\ nmn / \ nwijj / 



ADDITIVE PARTITION FUNCTIONS 


263 


which, for sufficiently large n, in turn exceeds 


(3.18) 



ynu mi 2 mu rrin/ 


Lemma 5 Statement is the same as for Theorem 2 The proof then pro¬ 
ceeds as follows - 
We have 


(3 19) 


\C{k 


,n )) <fe Z I f(j) | a(r i; )') . 
V-i ,=jt+i / 


From an examination of (3 5) and (3.6) we may see without any difficult y that 
the second of the three teims of the right member of (3.5) (after removal of 
parentheses) is asymptotically equal to n times the last term of the right member 
of (3 6) and hence that the other two terms of the right member of (3 5) are 
asymptotically equal to n times the right member of (3 6) without its last term. 
Now when 


— VSci 


which will always occur when i is equal to or greater than a sufficiently large 
fixed integer m, that part of the right member of (3.6) which is in square brackets 
is easily seen to be negative. Hence from the definition of asymptotic equiva¬ 
lence it follows that, for all n sufficiently large, 


(3 20) 


nffa + 1 ) w b^ (w , + l) cl) (n« + -p^n^ny 

^(2fj4-2) ^(M+1) ^(p+1) 


and 

(3.21) + < 2«e" +2 < 2«<?" 


Hence, for all n sufficiently large, 

(3 22) <r 2 (n,,) < 2ne >l 

Now consider the expression (3.5) for i = m and i = g + 1 Passage from n to 
M + 1 multiplies the first term of the right member of (3 5) by 

no \ («i ~ 2,u)(ni - 2 m - 1) 

’ (» - 2 m - 2)(» - 2g - 3)’ 


and the third term of the right member by 


(3,24.) 


( «i ~ m) z 

(n — m — l ) 4 


It is easy to see that for large but fixed m and all n greater than a lower bound 
which is a function of m only, the expression (3 23) is less than the expression 
(3.24). Hence, in view of (3.20), the sum of the first and third terms of the 



2G4 


J, WOLFOWITZ 


right member of (3.5) for i = n + 1 is negative. Xow consider what happens 
to the second term of the right member of (3.5) when i goes from m to ju + 1. 
It is multiplied by 


(3.25) 


(n ~ ix - I ) 1 


which, also for large but fixed p and all n larger than a lower hound which is a 
function of g only, is easily seen to he less than t\ Consequently 

(3.26) ffW+i)) < 2nf" n . 


It can be seen without difficulty that such a passage of (3.5) to the next higher 
index is always accompanied by multiplication by expressions similar to (3.23), 
(3 24), and (3.25), for which similar inequalities hold and that consequently 

(3.27) 0 < ff{r u ) < 2 nc\ 


and for similar reasons 

0 < <r 2 (r 2l ) < 2nr\ 


for all i not less than n and for all n greater than a lower bound which is a func¬ 
tion of ix only (although it may ho necessary to increase the original /i so that 
both the last two equations hold). The. required result follows from (3.19) 
and the convergence of the series (3.9). 

The proof of Theorem 4 follows along the same lines as that of Theorem 2. 

When/(a) s 1, F(V ) s* t/(V), the, statistic discussed in [2J. Other such 
results follow from specialization of f(x) Theorem 4 may also he generalized 
so that the elements m which add up to «i are operated on by a function /, , 
while the elements a, which add up to nj aie operated on by another function 
/ 2 , but this is easy to see and we do not go into the details. 


4. Tests of hypotheses in the non-parametric case. The great advances 
that have been made in mathematical statistics in recent years have been in 
two directions. On the one hand, the foundations of statistics, the theory of 
estimation and of testing hypotheses have been put on a rigorous basis of 
probability theory, and on the other, powerful methods for obtaining critical 
regions and confidence intervals and criteria for appraising their efficacy have 
been developed. Most of these developments have this feature in common, 
that the distribution functions of the various stochastic variables which enter 
into their problems arc assumed to be of known functional form, and the theories 
of estimation and of testing hypotheses are theories of estimation of and of 
testing hypotheses about, one or more parameters, finite in number, the knowl¬ 
edge of which would completely determine the various distribution functions 
involved. We shall refer to this situation for brevity as the parametric case, 
and denote the opposite situation, where the functional forms of the distributions 
are unknown, as the non-parametric case. 



ADDITIVE PARTITION FUNCTIONS 


265 


The literature of theoretical statistics, therefore, deals principally with the 
parametric case The reasons for this are perhaps partly historic, and partly 
the fact that interesting results could more readily be expected to follow from 
the assumption of normality. Another reason is that, while the parametric 
case was for long developed on an intuitive basis, progress in the non-parametric 
case requires the use of modern notions. However, the needs of theoretical 
completeness and of practical research require the development of the theory 
of the non-parametric case. The purpose of the following section is to con¬ 
tribute to this theory. 

Brief mention of some of the literature may be made here The problem of 
parametric estimation by confidence intervals, was put on a rigorous foundation 
by Neyman [4] and extended to the estimation of distribution functions in the 
non-parametric case by means of confidence belts by Wald and Wolfowitz 
[5]. Problems of testing non-parametric hypotheses have been treated in 
various places. The rank correlation coefficient has been used for a long time 
to test the independence of two variates. Its distribution was shown to be 
asymptotically normal by Hotelling and Pabst [6] and its small sample distribu¬ 
tion was discussed by Olds [7]. The problem of two samples has been dis¬ 
cussed, among others, by Thompson [8], Dixon [9] and Wald and Wolfowitz 
[2]. In 1937, Friedman [10] posed the non-parametric analogue of the problem 
in the analysis of variance and proposed a very ingenious solution. 

All these proposed solutions have this in common, that there exists no general 
principle which cam be applied in each particular case to obtain a critical region, 
a role which is performed in the parametric case by Fisher’s principle of maxi¬ 
mum likelihood and the likelihood ratio criterion (Neyman and Pearson, [11]), 
whose validity, at least for large samples, has been established by Wald ([12], 
[13]). In each problem the solutions proposed have been intuitive and usually 
based on an analogy to the corresponding problem in the parametric case. Thus 
the principal justification for the use of the rank correlation coefficient is that 
its distribution is independent of the unknown distribution function (under 
the null hypothesis) and that its structure resembles that of the ordinary cor¬ 
relation coefficient. But any function of the order relations among the variates 
(cf. [2], p. 148) has a distribution which is independent of the unknown popula¬ 
tion distribution under the null hypothesis The same objection may be made 
to papers [8], [9], [10], [2], except that in [2], although the solution there proposed 
is an intuitive one, the criterion of consistency is extended from the parametric 
case to the non-parametric one. The fulfilment of this condition is a minimal 
requirement of a good test and on this basis the solution proposed in one of the 
previous papers cannot be considered a good one. 

In the following section we shall show that the likelihood ratio criterion may 
be extended to the non-parametric case where the test must be made on the 
order relations among the observations and that for a certain class of these 
problems which fulfill the same requirement as that for the application of the 
likelihood ratio criterion in the parametric case it would thus appear to furnish 



260 


J. WO I..FO WITZ 


a general method by which statistics may he obtained for a specific problem. 
We shall show this by applying it to the pioblem of two samples This will 
serve to explain, the method. Another problem will he discussed later, The 
ultimate justification of any statistic must be its power function, which ought 
therefoic to constitute the next subject of investigation for these problems. 
Since for problems in the non-parametrie case it is almost certain that uniformly 
most powerful tests do not exist, the question of determining the alternatives 
with respect to which proposed tests are powerful is particularly important. 

6. The problem of two samples. Let X and Y be two stochastic variables 
with the distribution functions/(.r) and g(x), respectively. (The term distribu¬ 
tion function will always denote the cumulative, distribution function. The 
letter P followed by an expression in braces will stand for the probability of the 
relation in braces. Hence P j X < x| = f(x) for all x.) f(x) and g(x) are 
assumed continuous. The n» observations xi , ■ • , x M1 and ns observations 

2 /i, 2 /z, ,y„ t are made on X and Y respectively. The (null) hypothesis 
to be tested is that f(x) a g(x), The admissible alternatives are all continuous 
distribution functions/(x) and g(x) such that J(x) g{, r), The n\ + = n 
observations are. arranged in ascending order of size, thus: Z ~ z v , ■ * • , z„ 
where Zi < z* < * • • < z„ (the probability that z, — z H i is 0). Let. V = i'i , 
i'j, • • ■ , Vn be a sequence, defined as follows: i\ = 0 if z, is a member of the 
set , x 2 , ■ • • , x„, and u, == 1 if z, is a member of the set yi, }h , ■ • • , y ni ■ 
Then any statistic used to test the null hypothesis must he a function only of V 
((2], p. 148). 

We now apply the method of Neyman and Pearson [11] as follows. 51 is the 
totality of all couples (rfi(x), rfj(x)) of continuous distribution functions. The 
set w, a subset of 0, is the totality of all couples of distribution functions for which 
di = d ». The sample space is the totality of all sequences V. The null hy¬ 
pothesis states that (/, g) is a member of w. The admissible alternatives are 
that (/, g ) is a member of il not in w. The distribution of any function of V 
is the same for all members of u. Hence this essential requirement on the 
statistic to be selected for the application of the likelihood ratio criterion (ef. 
[11]) is satisfied by any statistic which is a function of V alone. Furthermore, 
all sequences V have the same probability if the mill hypothesis is true ([2], 
p. 149) The numerator of the likelihood ratio is therefore, a function only of m 
and ns, is the same for all V, and is therefoic of no further interest. Hence 
T'(F), a function of V which is a monotonic function of the likelihood ratio 
for this problem, may be defined as the denominator of the likelihood ratio, 
as follows: Let P[P; (di , (k) | be the probability of V when / m d\ , and g *■ di . 
Then 

T\V) = max P{V ; (&, d 5 )j. 

n 

The critical values of T'(V) are the large values. However, we may use instead 
of T'(V) a convenient monotonic function of T'(V). 



ADDITIVE PARTITION FUNCTIONS 


267 


As an approximation to T'(V) we propose T(V), a statistic which is obtained 
on the assumption that for a given V a couple (d* , d*) which is essentially the 
same as that of the two sample distribution functions corresponding to the 
particular V approximates a couple which maximizes the right member of (5.1). 
(We say “a" couple because it cannot be unique.) This assumption seems a 
reasonable one, particularly for large samples. Only the form of (d *, d*) is 
assumed and the missing paiameters are obtained in accordance with (5.1). 
Befoie describing the matter precisely, it must be stressed that this is offered 
only as a plausible approximation. I*or certain extreme V, for example, like 
those where zeros and ones nearly alternate, this is definitely not the maximizing 
couple In spite of this the statistic T(V) assigns to these V values which are 
furthest removed from the critical legion for any level of significance, as indeed 
any good statistic should 

We first define a “ run” as in [2], p 149 A subsequence Up+p , W( (+2 ) , ■ ■ , 
V( t+ r) of V (where r may also be 1) is called a “run" if y ((+ p = r (i+2) = • • = 
and if v t ^ «(i+i) when t > 0 and if V( t +>) ^ V(!+r+p when t + r < n Let h, 
be the number of elements m the j th run of elements 0, and k, the number of 
elements in the j th run of elements 1 Suppose for a moment that the first 
element in V is a 0. Consider the following situation. There is an interval 
[ai, as], ai < tij, on the line — °o < x < + «> such that 

P{ai < X < a 2 ) >0, P{ax < Y < 02 ) = 0, 

P[X < Oi) = P{Y < ax) - 0. 

This is followed by an interval [6 t , b 7 ], fix = a 2 , such that P{bi < X < fi 2 } = 0, 
P{bi < Y < bt} > 0 This is in turn followed by an interval [a 3 , a 4 ], a 3 = bi , 
such that P[a 3 < X < a 4 j > 0, P{a 3 < Y < a 4 } = 0, etc. It is clear that the 
lengths and location of the intervals described are immaterial, provided only 
that they do not overlap. Also the distributions of X and Y within each 
interval are immaterial, provided only that they are continuous All that 
matters for finding P\V, (d* , d*)} is that the number and the order of the dis¬ 
junct intervals shall be the same as those of the runs in V, (i.e , intervals of 
positiv e probability for X must alternate with intervals of positive probability 
for Y, the number of intervals of positive probability for X and for Y must 
equal respectively the number of runs of the element 0 and the number of runs 
of the element 1, and the probability of the first interval on the left shall be 
positive for X or for Y according as the first run in V is of elements 0 or of ele¬ 
ments 1, with the same relation obtaihing between the last interval on the right 
and the last run in V) and the probability of these intervals. Let Pij be the 
sought for probability of the interval which corresponds to the jth run of ele¬ 
ments 0 and Pij the probability of the interval which conesponds to the jth 
run of elements 1. In order to obtain V, it is necessary that the elements con¬ 
stituting each run shall fall into its corresponding interval. Then cleaily by the 
multinomial theorem 

(5.2) P{7; (dx*,4)) = II n,!(II 0,1) ~'P l t y) 



268 


J. WOLFOWTTZ 


where i ~ 1, 2 and where, when i is fixed, the product with respect to j is taken 
over all runs of the corresponding element. The right member of (5.2) is to be 
maximized with respect to the P,, , subject of course to the constraints 

(6-3) E Pi, *=1 (»’ - 1, 2). 

J 

Then it may easily be verified that the maximum occurs when 

(5.4) P lf =» ^ « - 1, 2) 

n , 

For, after multiplying by a constant and taking the logarithm we introduce two 
Lagrange multipliers n, and /n so that the. maximizing P,j are given by the 
equations (5.3) and those obtained by equating to zero all the partial deriva¬ 
tives of 

EE (Ay log P„- - n.Pi,). 

« J 

The latter are therefore 



for all j, whence (5.4) follows, It is easy to see that the extremum thus ob¬ 
tained is a maximum and also an absolute maximum. The sought-for statistic 
T(V) is then the right member of (5.2) after the results (5.4) have been inserted. 
It may be simplified by removing all factors which are functions only of n, 
and ni (since these will then be the same for all V) and recalling that 

(5.5) E Ay = n, (i = 1, 2). 

> 

It will be convenient to take the logarithm of the resulting expression, so that 
with a slight change of notation we finally have 

(5.6) T(V) = E E A, 

1 J 

where 

( 5 . 7 ) *“->•*(!)• 

This result is immediately extensible to the problem of k samples and by way 
of summary we recapitulate it as follows: 

Lot there be given k stochastic variables Xi, , Xk with the respective 
distribution functions /i(x), ••• ,/*(x), about which nothing is known except 
that they are continuous. Random independent observations, n,- in number, 
are made on !,■ (j = 1, , k). It is desired to test the hypothesis that 

fi 53 h — ■ ■' = h , the admissible alternatives being all fc-tuples of continuous 
distribution functions. The sequence V is obtained from the sequence Z by 



ADDITIVE PARTITION FUNCTIONS 


269 


replacing an observation on X, by the element i. Let l, 3 be the number of 
elements in the jth run of elements i. Then the corresponding statistic for 
testing the null hypothesis is T k {V ) or any monotonic function of it, where 

T k (V) = EIl, 

•-1 j 

and lij is given by (5 7). The large values of Tk(V) are the critical values. 

Let r,y denote the number of runs of length j in the elements i. Let 
E r t3 = r,. Of course ^,jr l3 = n,. Also let s 3 = E r t3 . Then 

3 3 i 

(5-8) T k (V) 

t j 

and 


(5.9) W) = EjSi- 

3 

If a table were constructed of the numbers (5.7) from 1 to 50, say, or fiom 1 
to 100, this would cover most of the cases arising in practice. The calculation 
of Tk(V) by means of (5 9) would then be so simple that it could be performed 
very expeditiously by an ordinary clerk and with very much less labor than is 
required for most statistics in common use, like the correlation coefficient, for 
example. As a matter of interest we note that 

1 = 0 

2 = .693 

3 = 1 50 


4 = 237 

5 = 3.26 

and that 

(5.10) P < V 

where p is any integer > 1. (5.10) follows from the fact that 

p ] > (\/ 2irp — l)p v e~ p . 

The distribution of T(V) may be found for small samples by enumeiating 
the sequences V, all of which have the same probability under the null hypothesis, 
and assigning to each V its T(V). The critical region consists of the F’s for 
which T(V) takes the largest values, taken in sufficient number to make the 
critical legion of proper size It will not be necessary to enumerate all the 
F’s, since it is readily apparent that certain V’s can never belong to a critical 
region of any reasonable size. (Roughly speaking, a V with a large number of 
runs of short length will yield a small T(V) and vice versa ) For large samples, 
the result of Section 3 is available, with f(x) = x. From (5.10) it follows 



270 


J. WOLFOWITZ 


easily that the corresponding series (3.9) is convergent, so that T(V) is asymptot¬ 
ically normally distributed. It must be remembered when using tables of the 
normal distribution that the critical region of ?(F) lies in only one "tail” of 
the normal curve. The greatest difficulty will occur for samples of moderate 
size. Methods like those of Olds [7] will probably help there,. It is highly 
unlikely that any practicable formula which would give the exact distribution 
of T{V) exists. 

A few brief remarks may be made here on a related problem. Suppose we 
have observations from two bivariate populations about the distributions of 
both of which nothing is known except that they are continuous and it is sought 
to test whether the two populations have the same distribution functions. 
Suppose further that it were requiied that the statistic used for this purpose be, 
invariant under any topologic transformation of the whole plane into itself. 
At this point we quote the following topologic theorem, the proof of which was 
communicated to the author by Dr. Herbert Robbins: Let Xi , yi , £a, Ui , • ■ • , 
x p , y p be any 2 p distinct points m the plane. There exists a topologic transforma¬ 
tion of the whole plane into itself which takes x, into (i = 1, 2, • • • , p). As a 
consequence of this theorem we get the absurd result that the required statistic 
must be a constant. Hence this statistical problem can have no solution. 

As a matter of interest this statistical problem would have no solution even 
if it were not for the topologic theorem. The fact is that a continuous distribu¬ 
tion on a line remains continuous under a topologic transformation of the whole 
line into itself, but a continuous distribution in a k-dimensional (Euclidean) 
space (fc > 1) may become discontinuous under a topologic transformation of 
the whole space into itself. (The probability distribution in the first space 
always determines a probability distribution in the transformed space, for 
probability functions are defined over all Borcl sets of the space (cf. [15], p. 7) 
and a topologic transformation carries Boiel sets into Borel sets (cf. [16], p. 195, 
Theorem II)). Consider the following example in the plane: A bivariate 
distribution function assigns probability 1 to a line L oblique to the coordinate 
axes, while any interval which contains no segment of the line L has probability 
0. On the line L the (one-dimensional) probability distribution may be ar¬ 
bitrary, provided it is continuous. The bivariate distribution function is 
without difficulty seen to be continuous. Now rotate the coordinate axes until 
one of them is parallel to L. It is easy to see that after the rotation the bivariate 
distribution function is discontinuous. 

The question of whether a useful statistical problem could ho obtained by 
properly delimiting the class of transformations which are to leave the statistic 
invariant and the solution of such a problem remain to lie investigated. 

6. The problem of the independence of several variates. This is an important 
practical problem and one of the earliest discussed in the literature (cf., for 
example, [6]), Let A) and A 2 be stochastic variables with the joint (cumulative) 
distribution function F(xi , xf) which is known to be continuous in both variables 



ADDITIVE PARTITION FUNCTIONS 


271 


jointly (i.e., F(x i, x 2 ) = P[Xi < X\ ; X 2 < 22), where the right member is the 
probability of the occurrence of both the relations in braces). The marginal 
distributions/i(a:i) and/ 2 (x2) of Xi and X 2 respectively are defined as follows: 

= P{X 1 < xi) = lim F(x i, x 2 ), 
fz(x 2 ) = P{X 2 < x 2 \ = lim F(x 1, xz). 

Xl~*+tc 

(It is easy to see that the continuity of Fix 1, x 2 ) implies the continuity of /i(xi) 
and fi(x 2 ).) 

The n random, independent pairs of observations x n , xn, • • • Xi„ , x 2n are 
made on Xi and X 2 . The null hypothesis states that 

(6.1) F{x 1, x t ) = Ji{xi)-fi{x 2 ) 

i.e., that Xi and X 2 are independent The alternative hypotheses are that 
F(x 1, x 2 ) does not satisfy (6.1). 1 

Let the set x u , x u , Xi 8 , • • • , x ln be arranged in order of ascending size, thus: 

Z = zx , z 2 , z a , ■ ■ ■ , z„ where Zi < z 2 < ■ < z„ . The jth member of this 

sequence will be said to have the rank j , In the same manner ranks are assigned 
to the x 2] 0 = 1, ■ • ■ ,n). (It is easy to see that, since /i(xi) and / 2 (x2) are 
continuous, the probability that z, = z J+ 1 is 0 etc) In the sequence Z the 
element zj (j — I, • ■ , w) is replaced by the rank of its associated observation 
on Xi. We obtain a permutation of the integers 1,2,- ,n which we denote 
by R. If in the procedure for obtaining R, we had reversed the roles of the x !7 
and x 2l , we would have obtained the permutation R' It is easy to see that 
any statistic, say M ", used to test the null hypothesis, must be a function only 
of R, with the added proviso that M"{R) = M"(R') (The rank correlation 
coefficient is such a statistic) Under the null hypothesis all the R have the 

same probability 

The procedure of applying the likelihood ratio principle to this problem would 
then be as follows, is the totality of all bivariate distribution functions 
H(xi , x 2 ) which are continuous in both variables jointly The respective mai- 
ginal distributions corresponding to H{x\, xz) will be denotedby hi (x{) and h 2 (x 2). 
oj is a subset of 0 which consists of all H(xi, x 2 ) for which If (xi,x 2 ) = hi(xi)‘hz{x 2 ). 
The sample space is the totality of all sequences R. The null hypothesis states 
that F{x\ , Xi) is a member of to. The admissible alternatives are that F{x\ , x 2 ) 
is a member of fi not in to. The distribution of any function of R is the same 
for all members of to. Thus the essential requirement lor the applicability of 
the likelihood ratio criterion is fulfilled. All sequences R have the same proba¬ 
bility for all members of to; hence the numerator of the likelihood ratio is a func- 

1 It is easy to see that the independence or dependence of two stochastic variables is not 
a property which will remain invariant under a topologic transformation of the plane into 
itself, We therefore require of the statistic only that it be invariant under topologic trans- 
formation of each vauablc into itself, separately. 



272 


J. WOL.FOWITZ 


tion only of n which may therefore he ignored. We may then define M'(R), 
a monotonic function of the likelihood ratio as the denominator of the likeli¬ 
hood ratio, thus: 

(6.2) M'(R) = max P[R\H(xi, a: s )j 

0 

where P{R,H(xi , x 3 )) is the probability of R when //(. r t , .r : ) is the joint distri¬ 
bution function of Xi and X *. Tiie critical values of M'(R) are the large values. 

We now propose an approximation to M'(R) which we shall eall M[R). We 
do this bv describing a distribution function H*{x i, x 5 ) for each It which seems 
a plausible approximation to a maximizing distribution function. It may be. 
derived from certain assumptions about tlie nature of the maximizing distribu¬ 
tion function which wc omit. The remarks made in the preceding section about 
the character of the approximation apply here as well. As before we specify 
only the form of the function and leave certain parameters, finite in number, 
to be determined in accordance with (G.2). (If the construction of //*(xi, xP) 
should appear somewhat involved, this is due only to the analytic description. 
A sketch will show the essential simplicity of the situation.) Wo then have 

M(R) « />(£;#*(*, ,x,)|. 

Let R = a,, aa , • • • , a, bp a given permutation of the integers 1 to n. A 
sub-sequence a ( < +1) , a ( ,+ 2 ], ■ • • , a ( , +l) will be called a run of length l if the 
following conditions are fulfilled: 

(6.3) The indices of the a’a are consecutive, 

(6.4) If l 1 is any integer such that 1 < V < l, then 

| 0(1+1') ~ Ofi+i'+l) | = 1 ) 

(6.5) if i > 0, | a { - a ( ,- + i) | > 1, 

(6.6) if i + l < n, | ao+o — a(, + f+i) j > 1. 

The run will be called an ascending run or a descending run according as 
o ( i + D — a (1+ 5 ) = — 1 or +1. A run of length 1 is of either type, at pleasure. 
For example, let 

R = 5, G, 1, 4, 3, 2. 

The first run is 5, 0, the second 1, the last 4, 3, 2. 5, 6 is an ascending run of 
length two, 4, 3, 2 a descending run of length three, and 1 a run of length one. 

H*(xi, Xi) is a degenerate distribution function such that the relation between 
Xi and Xi is functional (this is a special case of stochastic relationship). That 
is to say, X 2 = ?(Xi), where v(Xi) is a single-valued function defined for all the 
possible values of Xi, with a single-valued inverse p -l (Xj) defined for all possible 
values of X 2 . Hence H*(x 1 , x s ) is completely specified when the function 
X a = ifi(Xi) and h (xi) the marginal distribution function of Xi, are given 
(hi (xi) must of course be continuous). 

Consider a system of intervals on the line — « < * x < + <*> of which (i — 1 ,i) 



ADDITIVE PARTITION FUNCTIONS 


273 


is the ith, i = 1, 2, ■ ■ ■ , n and a similai system on the line — °o < r 2 < + °°. 
(Actually, as in the previous section, neither the length of the intervals nor 
their location is material The intervals need merely be disjunct and in a certain 
order. We are using these particular intervals to simplify the notation ) Let 
h be the length of the first run. ai is its first element. Then let 

Pi = P|0 < Xj < h;hUxi)} 

be one of the as yet undetermined parameters We now partly define ht(x i) 
as follows: 

ht(x i) = 0, xi < 0 

(6 7) h?(xd = 1, Xl >n 

fcf(Zi) = px. 

Within the inteival (0, h), h*(x i) may be any continuous monotonic increasing 
function which satisfies (6 7) We partly define <p(Xi) as follows’ 

If the first run is ascending, let 

(6.8) y>(0) = a, - 1 

(6 9) v 5 ( J; i) = &i — 1 + ai, 0 < Xi < Zi. 

If the first run is descending, let 

(6.10) ^(0) = ai 

(6.11) ip(x i) = ai — Xi , 0 < Xi < k . 

We proceed in this manner through all the runs of R Let l , be the length of 

the zth run. Let X, = Eh. The first element of the jth run is ci(x, + i> . Let 
»<j 

p, = P{\, < Xi < X, + l, ; ht(x i)}, 

be anothei of the as yet undetermined parameters. We then define h* (®i) as 
follows: 

(6.12) ht(\) = E P. 

Kl 

(6.13) ht(\ + l,) = E P<- 

>5) 

Within the interval (X,, X, + l,), h* (xi) may be any continuous monotonic in¬ 
creasing function which satisfies (6,12) and (6 13). We define <p(Xi) as follows: 
If the jth run is ascending, let 

(6.14) <p(x i) = a ( *,+i) - 1 + xi (Xy < xi < X, + l,). 
If the jth run is descending, let 

(6.15) p(x i) = a(x,+n — xi (Xy < Xl < X, + Zy). 

If l, = 1, the run may be considered ascending or descending at pleasure. 



274 


J. W0I.F0WIT7, 


In older to obtain R, it is necessary that all the elements of a tun shall fall 
into its corresponding intenal. Then it is easy to sec that by the multinomial 
theorem 

(6.16) — n! II (f 1 !)“ 1 pj‘. 

* 

The light member of (6.16) is to be maximized with respect to the p, subject to 
the constraint 

(G.17) Sp. = 1. 

It is easy to verify that the maximum occurs when 

(6.18) p, = -. 

n 

M(R) is the right member of (0.16) after the results (0.18) have, been inserted. 
It is convenient to remove all factors which are functions only of n and to take 
the logarithm of the resulting expression. Then with a slight change of nota¬ 
tion we may say that 

(6.10) M(R) = £ l 

where 

( 0 . 20 ) *•-"*(£)' 

The critical values of M(R) are the large values. One may verify without much 
difficulty that M(R) — M(R'), i e., that the statistic is symmetric with respect 
to Ai and AT as indeed it should be. 

This result is immediately extensible to the ■problem of testing whether ft 
stochastic variables Ah , • • • , AT arc independent. We shall not go into the 
details, which are similar to those described above, and content ourselves with 
giving the definition of a run for the case k =* 3. After the observations on A) 
have been arranged in ascending order, we obtain two sequences Ri and R >, 
the associated ranks of the observations on A s and X 3 . Let R 3 = bi , bt , • * • , b n 
and R z = , bi ■ , 6„ . The ascending sequence, of consecutive integers 
{i + 1), (i 4- 2), • ■ • , (i + l) determines a run of length l if the, sequences 
f'or-i) ) h(,+ 2 ) > • • • , b(, in and £»;,• m , 6( l+ z> b[, +t) both satisfy (0.4), and if at 

least one of the sequences satisfies (0.5), and at least one, but not necessarily 
the same one, satisfies (6.6), The adjectives ascending and descending apply 
to each sequence separately, 

Let r } be the number of runs of length j in R, Then 

(6-21) M{R) = Zjr,-. 

) 

Most of the remarks made in Section 5 about the small sample distribution of 
T(F) are also applicable to the distribution of M{R). More will be said in the 



ADDITIVE PARTITION FUNCTIONS 


275 


next section about the distribution of M(R) which involves the solution of a 
combinatorial problem not discussed m the literature. 

7. On the distribution of W (i?). While most of the remarks made about the 
small sample distribution of T(V) apply to the question of the distribution of 
M(R) in small samples, the situation with respect to the distribution of M(R) 
in samples of medium size and large size is very different and, in certain respects, 
is more favorable for practical application than is the case with T(V ) It would 
be reasonable to expect, for example, in view of Section 3 and of the structure 
of the statistic M(R) that the asymptotic distribution of M(R') should be normal. 
Surprisingly enough, this is not the case It is not even continuous In order 
to clarify the situation, we begin with a few necessary ideas and definitions. 

Let the stochastic variable W{K) be defined as the total number, in R, of 
runs of the sense of Section 6. We shall be interested in the distribution of W(R). 
The number n of the pairs of observations on Xi and X 2 (we consider the case of 
two variates) will be assumed arbitrary but fixed throughout the discussion and 
will not be exhibited Let N (fc) be the number of sequences R (of the integers 
1 to n ) which contain exactly k runs. 

Consider, for example, for the case n = 6, the sequence 2 3 4 6 5 1. We 
shall say that this sequence contains the “contacts” (2, 3), (3, 4), (6, 5) In 
general, a contact is defined as the juxtaposition, in the sequence R, of consecu¬ 
tive numbers, whether in ascending or descending order. If fc is the number of 
runs and l the number of contacts in a sequence R, then obviously 

(71) k + l = n. 

Let Bo be the quence 1, 2, • • • , n of the first n integers in ascending order. 
The n — 1 contacts of this sequence may themselves be arranged in a sequence 
R* of contacts, thus - 

(1, 2), (2, 3), •■•,(»- 1, »). 

Suppose l of the contacts which constitute the sequence R* are selected in some 
manner to form the set 0. The remaining n — 1 — l contacts form the comple¬ 
mentary set O'. After this selection the sequence R* may be considered a 
sequence of the type of the sequences V of Section 5 with the members of 0 
playing the role of the elements 0 and the members of O' playing the role of the 
elements 1 When R* is considered in' this manner we will write it as R*(0). 
The definition of a run of Section 5 as applied to sequences V is now applicable 
to R*(0). We will call any such run of the members of 0 or of O' a group. 

We wish first to answer the following question - In how many ways can the 
set 0 be selected from among the elements of R* so that it will contain l mem¬ 
bers arranged in R*(0) in i groups? If, for a given 0, i' be the number of 
groups into which O' is divided in R*(0), it is clear that t — i' can equal only 
— 1, 0, or +1. Hence only four situations can arise, as follows, 
a) i' = i + 1. The first group in R*{0 ) is therefore composed of elements of 



276 


J. WOLFOWm 


O'. The number of ways in which l elements can be divided into i runs of the 
type of Section 2 is the coefficient of x‘ in the purely formal expansion of 

(x + x 2 + x z + ••■)’ = 

and is therefore ~ Similarly n - 1 — l elements can be divided into 
i 1 = i + 1 runs in (^ l \ ^ ways. Hence this situation will arise in 


U - l - 2 


C=9C 


ways. Hence this situation will arise in 


n — 1 — 2^ 


b) = i — 1. By a similar argument as above, this can occur in 


(l - 1 \ /ft - l - 2 \ 

V ~ V \ ) WayS> 

c) i' = i and the first group is made up of elements from 0. This will occur in 
(l - 1 \ /ft - l - 2 \ 

.-1 j WayS> 

d) i' — i and the first group is made up of elements from O'. This will also 

. (l — l\ (n ~ l - 2\ 
occur in ^ _ J ^ J ways. 

The set 0 which contains l elements arranged in i groups can therefore be. 
selected in 

ways, and the quantity (7.2) is, by elementary combinatorics, equal to 

0 : 1)(”7 ')• 


Let any set 0 of l contacts divided into i groups be selected from 22*. Imagine 
that each contact in 0 seta up, in R 0 , an unbreakable bond which links the two 
elements involved in the contact, but no contact in O' creates such a bond. 
Given these bonds set up by 0, we seek the number of different sequences into 
which the n elements of R a can be permuted while respecting these bonds. 
Since there are l bonds, we can actually manipulate only n — l entities, except 
that two elements linked by a bond may have their order reversed; for example, 
if 0 contains (1, 2), 1 may either precede or follow 2 and the bond would still 
be respected. However, if one contact in a group is reversed, the group as a 
whole must be reversed, else a bond would be broken. Hence the number of 
distinct sequences into which 22 0 may be permuted while all the bonds set up 
by 0 are respected is 2'(» — l)\. 

Let ub refer to the sequences thus obtained as the family generated by 0. 
All the sequences in a family are distinct. Now let 0 range over all sets of l 



ADDITIVE PARTITION FUNCTIONS 


277 


contacts selected from R* The various families obtained will not be disjunct, 
but some will have sequences in common. In spite of this, we seek the total 
of the number of sequences in all the families. The total of the number of 
sequences in all the families generated by sets of Z contacts divided into i groups 
is, by (7.3) and the result of the preceding paragraph, 

< 7 « ’••(!:;)(“ 7 

Sets of l contacts may consist of 1, 2, - • ■ Z groups, so that the total number of 
sequences in all the families generated by sets of Z contacts is 

(75) A '-S 2 '0:0C^) ( ’*-» ! 

where Z may take the values 1,2,- • , (n — 1). The conventions on the combi¬ 
natorial symbols will be: 

( S )- 1 - “*°- 

(“)-°, „<b. 

Define Ao as 


(7.6) 


A 0 = 


The following equation is trivial: 

(7.7) A 0 = Z N(i). 

»-l 


We now consider all the families generated by sets 0 which contain exactly Z 
contacts. As was said before, the total of the number of sequences in each is A i 
Let H{1) be the set of all the sequences in all these families, with each sequence 
in H(l) counted as many times as the number of families in which it occurs. 
Every sequence in H(l) has the l contacts of the set 0 which generated it, but 
after permuting R 0 other contacts may still exist. Hence every sequence in 
H{1) has at least Z contacts and therefore by (7 1), at most n — Z runs. Clearly, 
a sequence which has exactly Z contacts occurs exactly once in H(l), since it 
can appear only in the family generated by the set 0 of its l contacts and in no 
other family. A sequence which has exactly (Z + 1) contacts will appear 


exactly ^ ^ times in H{1), for it will appear once in each family generated by 


a set 0 which consists of one of the 


CV) 


selections of l contacts from among 


its (l + 1) contacts, and in no other family 
exactly (Z 4- 2) contacts will appear in H (Z) 


Similarly each sequence which has 



times, and so forth. 


We 


therefore obtain, in view of (7.1), 



278 


J. WOLFOWITZ 


(7.8) At = li Q N(n -^) (i - 1, 2, ■ ■ , (»- 1)). 

The system of n linear equations (7.7) and (7.8) completely determines the 
quantities W(l), N( 2), • ■ ■ , N(n). The matrix of these equations has a deter¬ 
minant whose absolute value is 1, so that the quantities N(l), jV(2), • • , N(n) 
may readily be expressed in determinantal form. Furthermore the moments 
of W(R) are readily found from these equations. Thus from (7.8) for l = 1 
we find 

(7.9) S(Wm = —-~~-- 2 2 

and from (7.8) for 1 = 2 and l ~ 1 we find, after a little obvious manipulation, 

(7.10) Awm « - 2. 

Higher moments of 17(72) may be found in similar manner. 

Since the limiting variance of TF(72) is 2 it follows that the asymptotic distri¬ 
bution is not continuous. For n of any size the, bulk of the, values are concen¬ 
trated in a short interval ending at n. When IF(h’) = n, M(R) — 0, when 
W{R) = n — 1, M{R) - log 2, and when lF(fi!) = n — 2, M(R) — log 4^ or 
log 4. It is easy to see that for the values of W(R) which differ very little 
from n there are only a small number of values of M(R), whose asymptotic 
distribution is also discontinuous. The statistic IF(/i) is therefore a good 
approximation to the statistic M(R) for the purposes of tests of significance 
(for M(R) the large values are the critical values and for IF (ft) the small values 
are critical), and has a few additional practical advantages. It is even easier 
to compute than M(R ); the computation is best performed by counting con¬ 
tacts. Since the limiting variance is a small constant, it follows that many 
tests of significance can be performed simply by use of Tchebycheff’s inequality, 
For example, suppose a given large sample contains 9 contacts, i.c., n — 9 
runs (we say a "large” sample in order to use the simple limiting mean and 
variance; if desired or for a small sample these latter may be computed exactly 
by (7.9) and (7.10)). Then by Tchebycheff's inequality it follows that the 
probability of obtaining n — 9 or fewer runs is less than .041. Thus the presence 
of 9 contacts would be sufficient to render a sample of great size significant on a 
5% level. For the few numbers of contacts about which doubt will exist as to 
whether or not they are critical values two procedures are possible. Either the 
equations (7.7) and (7.8) may be solved exactly for the doubtful values, or 
several higher moments may be found from (7,8) and the methods of Wald [14] 
can be applied to delimit the missing probabilities to any accuracy desired. By 
enumerating the few values of M(R) which correspond to several of the largest 
values of W(R) the distribution of M(R) may be computed sufficiently to serve 
the purposes of tests of significance 



ADDITIVE PARTITION FUNCTIONS 


279 


REFERENCES 

[1] Maorice Frechet, Gtneraliies sur les Probability, Variables Aleatoires, Paris (1937) 

[2] A. Wald and J Wolfowitz, Annals of Math, Stat , Vol 11 (1940), p. 147. 

[3] A. M Mood, Annals of Math . Stat , Vol 11 (1940), p. 367. 

[4] J. Neiman, Phil Trans. Roy. Soc London, Vol 231 (1937), pp. 333-380 

[5] A. Wald and J. Wolfowitz, Annals of Math Stat , Vol 10 (June, 1939), p 105 

[6] H. Hotelling and M Pabst, Annals of Math Stat , Vol. 7 (1936), p. 29. 

[7] E. G. Olds, Annals of Math Stat , Vol 9 (1938), p. 133 

[8] W R Thompson, Annals of Math Stat., Vol. 9 (1938), p 281. 

[9] W. J Dixon, Annals of Math. Stat , Vol. 11 (June, 1940), p 199. 

[10] Milton Friedman, Jour Amer Stat Assoc , Vol 32 (1937), p 675. 

[11] J. Neyman and E. Pearson, Trans Royal Soc., A., Vol 231 (1933), p 295 

[12] A. Wald, Bull. Amer Math. Soc , Vol 46 (1940), p. 235. 

[13] A. Wald, Bull Amer. Math. Soc , Vol 47 (1941), p 396. 

[14] A, Wald, Trans. Amer. Math. Soc., Vol. 46 (1939), p 280. 

[15] Harald Cramer, Random Variables and Probability Distributions, Cambridge (1937) 

[16] F. Haosdorff, Mengenlehre (Second Edition), Berlin and Leipzig, 1927. 



ON THE THEORY OF TESTING COMPOSITE HYPOTHESES 
WITH ONE CONSTRAINT 

By Henry Scheff^i 
Princeton University 

1. Introduction. Our purpose is to extend some of the Neyman-Pearson 
theory of testing hypotheses to cover certain cases of frequent interest which are 
complicated by the presence of nuisance parameters. Our results give methods 
of finding critical regions of types B and Bi . Type B regions were defined by 
Neyman [1] for the case of one nuisance parameter. Type By regions are the 
natural generalization of the type A x regions of Neyman and Pearson [5] to 
permit the occurrence of nuisance parameters. The reader familiar with the 
work of these authors will recognize most of the notation and some of the 
methods. 

We consider a joint distribution of n random variables x x , Xi, • • ■ , x„, 
depending on l parameters Si, 9i, • ■ ■ , 6 t , l ^ n. The functional form of 
the distribution is given. The random variables may be regarded as the co¬ 
ordinates of a point E in an n-dimensional sample Rpace W, the parameters, 
as the coordinates of a point 6 in an Z-dimensional space ft of admissible param¬ 
eter values. SI, unlike W, in general will not be a complete Euclidian space. 
Let a) denote the subspace of SI defined by Si = 9 \. The hypothesis we consider is 

Ht : 0 t w. 

Neyman and Pearson [4] call H a a hypothesis with l — 1 degrees of freedom; 
for our present purpose we shift the emphasis by saying it has one constraint. 

It is clear that whenever we test whether a parameter has a given value, and 
other parameters occur in the distribution, we are testing a hypothesis with one 
constraint. Hypotheses of the type Oy ~ St , in which we do not specify the 
common value of By and , nor the values of any other parameters, may always 
be transformed to Ho by choosing new parameters. In general, the hypothesis 
that the parameter point 0 lies on some hypersurface in ft, g(By, &i, • ■ • , 6i) = 
go , may be transformed to Ho if the function g satisfies certain conditions,— 
say, g is continuous and monotone-increasing in one of the ff’a for all 0 in ft. 
Another circumstance lending importance to the theory of testing hypotheses 
with one constraint is its connection with the theory of confidence intervals, 
which we shall point out below. 

The path which led Neyman to critical regions of type B is the followipg: 
Every Borel-measurable region w of sample space determines a test of Ho, 
which consists of rejecting Ho if and only if E falls in w. In deciding which 
is a most efficient test, one may limit the competition to similar 1 regions, if 
such exist. Because of the general non-existence [2, p. 372] of uniformly most 

1 Defined by condition (a) of definition 1. 

280 



ON THEORY OF TESTING COMPOSITE HYPOTHESES 


281 


powerful tests, one is led to consider common best critical regions [4] if he is 
interested only in alternatives 61 < 0? (or 9 X > b\), or else regions giving an 
unbiased test [1, p. 251]. Narrowing the competition further to the latter 
class of regions, one is led to regions of type B if he seeks tests which are most 
powerful for 8 1 very near to 0°, and to type B, regions if he is not content with 
this. These types of regions are defined in section 2 

We may now state the relationship of hypotheses with one constraint to the 
theory of confidence intervals [2], To find confidence intervals for 8 i , we must 
first find similar regions w(&l) for testing Ha If with every admissible 0i 
we can associate a w( 6 i), then confidence regions for 6 \ are determined, and if 
these be intervals, they are confidence intervals. Every class of similar regions 
mentioned above is intimately related to a category of confidence intervals. 
In particular, to find Neyman's short unbiased confidence intervals we must 
first solve the problem of type B regions. Likewise, if we define shortest un¬ 
biased confidence intervals in the obvious way along the lines laid down by 
Neyman, their discovery rests on the solution of the problem of type B\ regions. 

While the assumptions of section 3, especially 3°, are unpleasantly restrictive— 
they are obviously tailored to fit the proof rather than the problem—they are 
nevertheless satisfied in many sampling problems associated with normal 
distributions. An application of the theorems of section 4 will be given in 
another paper On the ratio of the variances of two normal populations . The present 
theory was needed to round out that paper and was originally planned as a 
section thereof. However, it seems desirable for the convenience of other 
workers who might have use for the theory not to bury it under the preceding 
title. 

Section 5 consists of an appendix on the moment problem raised by assump¬ 
tion 6°. 

2. Definitions. The symbols w, Wo , w\ will always be understood to denote 
Borel-measurable regions in W. We shall symbolize d'Pr{E tw \ 9 }/d6\ for 
i = 0, 1, 2 by P(w | 0), P'(w | 0), P"(w | 0), respectively. Since 0i plays a 
distinguished r61e, it will often be convenient to write 9 = (0i, 4), where the 
nuisance parameters are denoted by d = {On , & 3 , ■ ■ • , df). 

Definition 1 : w 0 is said to he a type B region for testing Ha if for all 0 in u 

(a) P(wa | Oi , tf) = a, where a is independent of d, 

(b) P'(w o | 6 ? , d), P"(w o | 0 \ , d) exist, 

(c) P'{w o | 0\ , d) = 0, 

(d) P"(w 0 \ 8°1 , d) ^ P"{wi I e°i , d) for all w x satisfying (a), (b), (c) 

Definition 2: w 0 is said to be of type Bi if the conditions (a), (b'), (c), (d') 

are satisfied. The conditions (a), (c) are given in definition 1, the other two are 

(b') P'{u )o | 0i, d) is continuous in 8 , at ffi = 8 ° for all 0 in «, 

(d') P(W*| 6 i , d) Si P(w 1 1 0i, d) for all Wi satisfying (a), (b'), (c), and all 
9 m SI. 



282 


JfJ’MO ' HTFFi 


3. Assumptions, v *» • * s • •" ' u '' a #*' * *»■ »*>f*»ir>h for the 

p.d.f. (probability dnw'V lurriw, *-i M**r. t »•:*}*■> *„ wh<Me 

distribution depend* on « ^ Tb* tiut»D-nn« •.* «H« as nmrumnafollows 

that of Neyman clvwhrrp ’55 

1°. (a) Tharr owt* a P 4 f P'E **• J”? «r,v ir, and anv 

^ /*(V f»1 ~ j p K *1 "/It' 

where <fR' denoted the volunre otemeht dr, 

(b) The region IT, in IT >Mmrd by #»•£’ *n '* h r, radr'jw-ndent of B for 
0 < w. 

(c) The connectivity of w i<* *>ich *b*t it i> jowvubJr r*« jbu« from any point 
0' in w to any other point B" in w bv » path Ivin* entirely in and consisting 
of a finite nurniMT of Mwnwnte on ntteh of «hirh all but one of ^ . if,, ... t $, 
are constant. 

2®, For alt B t H'* ami B * w. p K Bf i» diffrrenlwtde twjr«> with n**fwt to $ t 
and indefinitely with reaped *<» , P,. • , t» r . For anv «*, and any Bew, 
the corresponding derivatives* of /‘or Wt exist and m»v lw- obtained by differen¬ 
tiating under the integral *ign m f« 

We now define 

do * etlogpf/t 1 BM# ( > «* fl*..*!#,, »,;• « 1,2, 

3°. For all B * IF* and fit w, <f>, u ■ <m'A\ BJ w mnf inuoya in K, » »« l, 2, • * • , I, 
and 

« 

(2) A, 23 A,t 4- 22 Rtufa, i,j m 2, 3, **• , l, 

I 

(3) #tt “ .'t,i 4- 22 , i ■> 1, 2, • • ■ , f, 

where An ■* 4,,(Si, 0), B,,i ™ B,, *(6?,«?) are continuous in each of 

8i, (h , * * * , 9 ,. 

4°. The matrix (^ t /3x/) r t * I, 2, j * 1, 2, ■ •• , n, contains an . 

I X I minor which ia non-singular* for all B t W+and B tw.and whose elements 
are continuous in E. 

Write * « (*» , to, • • ■ , *i)» and denote by p(& , $ j u>, B) the p.d.f. of (4n , *) 
calculated under the assumption that Btv>, i.p,„ that the p.d.f. of E is 
p(B | B)/P(w | 0) for E«w and »e.ro for E « W - m Define 

• If for each 0 <«, 4« ia violated on an exceptional ml f r (u) for which P(U(0) \ B) ■» 0, 
tho theorems 1 and 2 may still bo valid. What ia eneeiHial la the exlatenca of Hie p.d.f. 
p(*i,*«,••*, +i 16) lor all 0 ««. On recortaidering the theorem* and their proof*, the, 
reader will nee that if the set Ul&) ia deleted from W + , then l*(fc) may be viotated, but not 
seriously, and no essential changes are necessary. The addition of the necessary quali¬ 
fying clauses to our statements, regarding set* of probability Mro, would encumber the 
developments, 



ON THEORY OF TESTING COMPOSITE HYPOTHESES 


283 


(4) 


Q.(*K ©) = f 4>lp(<h i, 

•' — 00 


$ | w, 6 ) d<jn 


Let Wi be any region satisfying condition (a) of definition 1 • 

5°. We assume, for each 0 e u, that if the moments 3 of Q„($> | , 0 ) and 

G,($ | W, 0) are the same then these functions are equal for almost all $ 

(a) for s = 0 , 

(b) for s = 1 . 

Note that Qo is p.d.f , Qi is not. 


4. Theorems. A result of Neyman’s [1] for Z = 2 is generalized in the fol¬ 
lowing 4 

Theorem 1' Under the assumptions 1° to 5°, consider the existence of functions 
ki($, 9°i , d), i = 1, 2, such that ki < k 2 and 


(5) *!,*)<**! 

/ +“ 

■*>;?(<*!,$ I a!, * = o,i, 

-00 


for all $ = (<fn , 03 , • • , If such functions exist for some 0 = O' eo>, they 
exist for all 6 to>. Then the region Wo m W defined by 

(G) ME, el,d) < 9\ , d) and ME, 0?, d) > fe(«, , &) 

is independent of d, and is a region of type B for testing the hypothesis IIo. 

Since throughout the proof 0 = ( 0 ? , tf), we shall write 0 in place of these 
symbols to simplify the printing. It is to be understood that every statement 
in the proof involving the symbol 0 is asserted for all 0 in to. 

We suppose first that a type B region w 0 exists in W+ . Then from (a), (c) 
of definition 1 and assumptions l°(a) and 2 °, 


(7) 

f p(E | 0) dW = a 

" UJQ 

(8) 

f <hp{E\Q)dW = 0 

** UJQ 


Since the value of the integral (7) is independent of d, all its derivatives with 
respect to 0j, 0j, ■ ■ • , 6i must vanish. This leads [3, pp. 50, 51. Insert k t 
before <£7' in (15)] to 


3 By this teim we include "product moments ” 

* When I communicated this theorem to Professor Neyman, he informed me it was 
among the results of a thesis by R. Sat6, Contributions to the theory of testing statistical 
composite hypotheses. University of London, 1937, and he kindly sent me a copy of the MS 
I decided nevertheless to publish my version of theorem and proof, since for the reasons 
indicated in section 1 this theory should be available in the literature. 



284 


HENRY SCHEFFfi 


(9) 


1 [ II^'pCEIeW = Mih,k 3 , 

"icq »«2 


, fc ( |e), k = 0,1,2, 


> 


where M is independent of w a , and thus has the value obtained from (9) by 
putting w 0 = IF and a = 1, In particular, 


(10) a" 1 <t,,p(E\Q)d\V = 0, i = 2,3, ,1. 

•'tCQ 

The necessary condition (9) for (7) in also sufficient. Denoting by &(/|w, 0) 
the expected value of a function f(E, 0) calculated under the assumption that 
E e w, equation (9) may be written 

(id s (n & i “o, e) = & (n 4>\ i i if, g) . 

From assumption 5°(a) it then follows that 

(12) Qo($ | wo, 0) = Q 0 ($ | IF, 0) 

for almost all <f>. Conversely, (12) implies (11). 

In a similar manner we get from (8) with the aid of (9), 

(13) § ($i XI <j>\' | w 0 , 0^ = II <f>\' j IF, 0^ . 

We calculate the moments of the function Qi(4> | w, 0) to be 

n <*n‘ |ui, 0^ , 

and hence because of 5°(b), (13) implies 

(14) Qi(4> | wo, 0) = Qi($ | IF, 8) 

almost everywhere in the F-space. The pair of conditions (12), (14) are equiva¬ 
lent to the pair (7), (8). 

In order that w 0 be a type B region, it is necessary and sufficient that it satisfy 
(12) and (14) and that 

P"(w o | 0) k P"(w, | 0) 

for all wi satisfying (12) and (14). The inequality may be transformed with 
the help of l°(a), 2°, (3), (7), (8), and (10) to 

f 4nv(E\Q)dW £ [ 't>lp(E\e)dW, 

J W0 Jwx 

which is equivalent to 

/ "V 00 /*■+« 

■ ■ ■ / <t >i Pi<t> i, $ | Wo, ©) d<f >i d*j>2 • • • dfa 

-00 J — 60 



ON THEORY OF TESTING COMPOSITE HYPOTHESES 


285 


Sufficient for this is 

(15) Q 2 ($ | wo , 0) ^ Q 2 ($ | wi , 0). 

We note the functions in (12), (14), and (15) are all of the form (4) with 
s = 0, 1, 2, and propose to transform these to integrals over certain portions 
of the sample space W. First, we write (4) in the form 

fl+oo 

(16) QoOf 1 1 w, Q)J $1 | 4>, w, 0) dtpi = Qo($ | w, 6)&(4>l 14>, w, 0) 

Next, we consider “surfaces” <S(<f>, 0) in W+ , constructed as follows For 
any fixed 0 let D(0) be the l — 1 dimensional domain of values of 4>,{E, 0), 

% = 2, 3, • ■ , l, for E t W+ . A “surface” S($, 0) is the locus of points E 
for which 

(17) <£,(£, 0) = <t>', , a constant, i = 2, 3, ■ ■ ■ , l, 

the set of constants being in D(Q). Over every “surface” we now define a 
density p. Without loss of generality, and to simplify the notation, we shall 
assume that the non-singular minor postulated in 4° contains the minor (8^,/dx,), 
i = 2, 3, ■ • • , Z; j = 1, 2, • • • , l — 1, and denote by J{E, 0) its determinant. 
For E on £(<f>, 0) we define the density 

(18) p(E | 0) - p(E | 0)/ | J(E, 0) | , 
and consider “surface” integrals 

(19) f F,(E, 9) dxi dzi+i ■ ■■ dx„, 

Ji»a(*,e) 

where 

(20) F,(E, 0) = 4l(E, Q)p(E | 0). 

A “surface” integral (19) is to be distinguished from an ordinary multiple in¬ 
tegral, in that the integrand is not merely a function of x t , x w 
there may be several points E on the surface with the same values for these 
coordinates, but different values for the integrand. The integral is to be 
thought of as follows: The part wS($, 0) of the “surface” <S(4*, 0) is partitioned 
into pieces AS, on each a point E is chosen, and the value of the integrand at E 
is multiplied by the “area” of the projection (taken non-negative) of AS on the 
xi , xi+i , • • • , a:„-space. The “surface” integral is the limit of the sum of 
such products as the norm of the partition approaches zero. 

Denoting the integral (19) by 1(a) for the moment, we may calculate that 
for $ e D(0) 

1(a) = I(0)&(4>‘i |4>, w, 9), 1(0) = Q 0 ($ \ w, Q)P(w \ 0), 

and hence we see that the right member of (16) is equal to the integral (19) 
divided by P(w | 0). The desired relationship between the ordinary integrals 
(4) and the “surface” integrals (19) is thus 



286 


HENRY BCHEFF^ 


(21) Q s ($ I w, 0) = f F.(E, 0) n dx,/P(w I 6). 

The conditions (12), (14), (15) may now be written 

(22) f F.(E, 0) XX dx, = a f F,(E, 6) fl dx ,, 8 = 0, 1, 

Ji£. 0 S(*,e) J S c*,0) j-I 

(23) [ FtVB, 0 ) n *5# § f F*(£, 6) n <**/, 

*'WoS(#.Q) Jw|£(4»0) 

if $ is in the domain D(0), else they are satisfied trivially, w 0 will be a type 5 
region if equations (22) are satisfied for almost all 4> t D(0), and if (23) is valid 
for all Wi satisfying (22). 

We now hold 0 fixed in u and fixed in D(0), so that &($, 0) is fixed, and the 
right members of equations (22) have constant values. The proof [5, p. 11] 
of the lemma of Neyman and Pearson giving sufficient conditions that a region 
maximize an integral, subject to integral side-conditions, is easily seen to be 
valid for our "surface” integrals, and a sufficient condition that w 0 $(<!>, 0) 
have the desired property is then that it be defined by 

(24) <pt(E, 0) > <*0 + ai<fci(-F, 0), 

where a 0 , a t are independent of E on S(i, 0), and are such that equations (22) 
are satisfied. Since 0 and $ are fixed, we may permit a< to be of the nature 
aj = a,(4>, 0), i = 1, 2 Introducing functions fci < kt, fc, = fc,(<I>, 0), and 
defining a 0 , ai from 

flo — — fcifc 2 , Ui — ki + kt , 

the inequality (24) is satisfied if (6) is. Still holding 0 fixed, suppose that 
fci, can be determined for all 4> (hence almost all $) in D(0) so that for the 
part wo<S(4>, 0) of <S($, 0), defined by (6), the equations (22) are satisfied. The 
parts w 0 S($, 0) of “surfaces” then sweep out a “solid” w 0 (6) in W+ , defined 
by (6). If we can similarly determine fci and k 2 , and hence w o (0), for every 0 
in co, and if furthermore Wo(0) is independent of 0, then it is the type B region 
we seek. 

The equations (22) have now served their main purpose, and we return to 
their equivalents, (12) and (14). For w o (0) defined by (6) 

p(4>i , $ | wo, 0) = p(<t>i , $ ] W, Q)/a if <fn < ki or 4>i > kt, 

and vanishes otherwise, and hence equations (12) and (14) are equivalent to (5). 

The remainder of the proof consists of deducing that fci, fc 2 exist, and that the 
associated region w o (0) is independent of 0, for all G t u, from the hypothesis 
of our theorem that fci , fc 2 exist for some 0 = 0'. By l°(c), 0' lies on a line 
segment L entirely in cu, on which all but one of the nuisance parameters, say 
0 2 , are constant Let us vary 0 over L. Then 6j , 0 4 , • • • , 6 t remain fixed 
and 02 varies over an interval I. The equations (2) for j = 2 now become 



ON THEORY OF TESTING COMPOSITE HYPOTHESES 


287 


ordinary differential equations in which the independent variable is 0 2 , the 
dependent variables are fa , fa , • ■ • ,</>;, and 0?, 0 3 , ■ • - , Oi are parameters. 
A well known existence theorem assures us of the existence of particular solu¬ 
tions u, and a non-singular (for all 0 2 in I) matrix (u t] ) of complementary solu¬ 
tions, i, j = 2, 3, • ■ • , l, such that the general solution is 

i 

Ol lit -{- ^ 'J Ui} Cj m 
;-2 

The u , are determined by initial conditions for the system (2) with j = 2, and 
the u,, by sets of initial conditions for the corresponding complementary system. 
Clearly, if these initial conditions are all chosen independent of E, then since 
the coefficients of the differential equations are all independent of E, the solu¬ 
tions u , and u tl enjoy the same property. On the other hand, the c 3 are in¬ 
dependent of 02 - Hence 

i 

(25) fa(,E, 02 ) = M,(0 a ) + Y < i = 2, 3, • ■ • , l, 

j-a 

The dependence of the fas, u’s and c’s on the parameters 0?, 0 3 , - • ■ , 0 ( has 
not been indicated, since these remain fixed throughout the present calculations. 

Let 9) be the l — 1 dimensional domain of the values of c,{E ) for E e W+ , 
and C: (cZ , cZ , ■ • • , cl) be a point in 9), and denote by S(C) the “surface” 
Cj(E) = cj Denote the surface £($, 9) defined m (17) by <S($, 0 2 ), and the 
domain D(6) of $ by D(0 2 ). Then since | u, ; | ^ 0, therefore for every 0 2 1 1, 
every S{C) with C e 3) is identical with some »S($, 0 2 ) with $ eD(0 2 ), and vice 
versa. From this we conclude for later reference: (A) the functions c,(E) 
are constant on every $($, 0 2 ); (B) if 02 , oZ' are any two values in I, then for 
every $ = fa 1 tD{d") there exists a fa eD(dZ) such that S(fa, oZ) is identical 
with S(fa', 02), and vice versa. 

Now let us integrate with respect to 0 2 the equation 

1 

a log p(E\e t )/ddi = fa = ihifa) + Y 112 ,( 62 ) 0 ,(E). 

1-2 

1 

log p(E 1 62 ) = p(0j) + Y p,( 02 )c,(E) + f(E), 

1-2 

where r(0j), 1 f(E), and all new undefined symbols in the sequel have 
obvious meanings. We get 

(26) v(E 10 2 ) = v (0 3 )J(E) exp \ Y Vj{ 8 2 )c,(E) 

L 7-2 

Next we differentiate the equations (25) with respect to Xk , and write the 
result in matrix form, 

(dfa/dXk) = (u,,){dcj/dxk), i, j = 2, 3, • • • , J; k = 1, 2, > • - , l — 1. 
Taking determinants, we have 

(27) J(E, 0 2 ) = Jiifa) J 2 (E). 



288 


HENBY SCHEFf£ 


Finally, we shall need to know the nature of the dependence of <£i on 82 and E: 
From (3), 

( 

dfa/ddi = iu(fl)) + + E Bm(di)tf>k • 

Substituting from (25), we get 

1 

dfa/ddi = Bm(8i)fa 4- A(d 3 ) -f- E B/d^c/E), 

i~i 


and integrating, 

HE, 82 ) — B( 6 {) 


** — n/u\ 


where 

(28) 


-B(£) 

B(0 2 ) = exp J BniM • 


Thus 

(29) 4>iGE. &) = M6 i) + E 5,(0»)c,(£) + B{ 6 l )g{E). 


In equations (22) we now use the definitions (20), (18) for the integrands and 
then substitute (26), (27), (29), As a result we obtain the equality of 


i(&) + E BHHE) Hr £(0 2 )f?(£)T 8(0* )/(£) 

»-* J 


I 


•exp E v,(_ 6 i)c t (E) 
U-!_- 


u’oSC'Mg) 




XX da:/ 


j-i 


and «times the "surface” integral of the same integrand over $(#, 81 ). Putting 
first s = 0 and then s = 1, and employing the previous conclusion (A), we find 
that the equations (22) are equivalent to 


f [g\E)m/\ HE) | } n dx f 

(30) “ oa(Mj) 

- a f imKEVl HE) | } n dx h s « 0 , 1 . 

Again using the expression (29) for fa , and noting from (28) that 5(0a) > 0, 
we may write the inequality (6) in the form 

(31) g{E) < *i($, 62 ) and g(E) > kj($, 82 ), 
where 

(32) «.(*, 9,) = [fc<($, el , 0) -I(0 5 ) - 2: 5,(9 2 ) Cj (K)J /B(. 6 t). 



ON THEORY OF TESTING COMPOSITE HYPOTHESES 


289 


It follows from our hypothesis that for 0 2 = d't (the 0 2 coordinate of 9') and any 
$ eD(0 2 ), functions «,(■£, 0 2 ) exist such that for the part woS($, 0 2 ) of S(< I>, 0 2 ) r 
defined by (31), equations (30) are satisfied. The region w a (9') is “swept out” 
by i«o<Si(4>, 0 2 ) as ranges over Z)(0 2 ). Now let 9” be any other 9 e L, call its 
0 2 coordinate 0 2 , let <t>" be any $ e D(0 2 ), and consider the possibility of finding 
«,(*", 0^) such that on the part WaS($ u , 62 ) of S($", 0 2 ), defined by (31), equa¬ 
tions (30) are satisfied. From the conclusion (B), S($", 0 2 ) is identical with 
S($', 0 2 ) for a suitably chosen $'«D(0 2 ). Hence if we take *,(<!>'', 0 2 ) = 
62 ), then WaS($", d'f) becomes identical with w a S(i', 0 2 ) where equations 
(30) are already satisfied. Letting 4>” range over D( 8 2 ), every w 0 $(<£”, 0 2 ) thus 
determined becomes identical with some WoS($', 0 2 ), and vice versa, by (B). 
Thus the region u) 0 (9 ,/ ) “swept out” is identical with wo( 6 '). This process 
defines k v ( 4>, 0 2 ) for all 0 2 1 / and $ < Z)(0 2 ), and hence determines fc,(4>, &l , t?) 
from (32). We now have functions &,(<$, 0?, d), ki < fa , satisfying (5), and 
corresponding regions ivd( 6 ) independent of 9, for all 9 e L, To conclude the 
proof, we use l°(c) to reach any point 0imi from 9' by a path consisting of a 
finite number of segments like L on which only one of the nuisance parameters 
varies. The definitions of /c.(f>, 8 ° , d) are continued along this path as above 
and the region iu 0 (9) is seen to be independent of 9 for all 9 in to. 

The following theorem may be regarded as a generalization of one by Neyman 
[6, p. 33] giving sufficient conditions that a type A region be also of type Ai : 
Theorem 2. Suppose the assumption 1°(6) holds for all 0 e ft. Denote 
0? , 1 ?) by 4 >l and let R(d) be the domain of values of <$ ,<$,••■, <f>°i for 
E f W+ and 0 e u. Then a sufficient condition that a region w 0 of type B, found 
by application of theorem 1, be. also of type Bi is that for all 0 til and all E tW+ 

(33) p(E | 0i, t>) = p(E | 0° , d)g(ipl, <tn , • • • ,4>i ; 0° > 81 , d), 

where g(yi, , • ■ • , yi ; 0° ; 0i , d) is a function such that d L g/dy\ > 0 for all 

yi , y 2 , • • • ,yi in R(d) and 9 e 0 — «. 

For the wo satisfying the sufficient conditions of theorem 1, the conditions (a), 
(b 7 ), (c) of definition 2 are satisfied, and it remains only to verify the condition 
(d 7 ). The regions Wi admitted for comparison in (d), as well as w 0 , must 
satisfy the equations (22) since these are equivalent to the conditions (a), (c). 
We reoall that 9 = (0?, d) in equations (22) and rewrite them in a notation 
better adapted to our present considerations: 

(34) f [*!]■ [p(E | 02, d)/\ J(E, el , d) 1 5 n dxj 

j-l 

= «[ [<t>l]‘{p(E\el,d)/\J(E,el,d)\}Xldx,, s = 0,1 

•'«(*«,* o,i>) J-( 

where i> a = (<f^ , 4 >», • • ■ , f E>{dl , d). 

To express the condition (d) in a convenient way, we now “shred” the regions 
wa , Wi of (d) for every 0i by means of the same “surfaces” we have been using 



290 


HENRY SCHEFF^ 


for 01 = el : For any w in 17+ ,0*0, and e D(0 °, i?) we define a “surface” 
integral 

/(tf.wl*,.#) = f {pC® | 0i, i?)/| J(E, 0 ?, t>) | } XX d*/. 

Then 

P(w 10i, iJ) = f • • • f I($°, u; | 0i, i0) d^l , 

and a sufficient condition for (d) is 

(35) H* 0 , wo | 0i , «?) £ w, | 0i, <?) 
for all 0 e ft and all 1 >° t D( 0 ? , t?). 

Again applying the lemma of Neyman and Pearson to the integrands of the 
"surface” integrals in (34) and (35), we find that a sufficient condition that our 
region w 0 be of type Bi is that there exist functions b,-($°, 0 ° , 0i , t?), i = 1, 2 , 
such that 

p(E | 0 i, ») > p(E | el, tf)[b a + bi<t>l(E, dl , t?)] 
if and only if E t w 0 . Employing (33), we may replace this inequality by 

(36) g(<pi , 0° ; 0i, t?) > bo + bi(f>? . 

Define b 0 , bi from 

o(k ,, 4>°; 0? ; 01 , I?) = bo + b a fc, , i = 1 , 2 , 

where k , = kiffl, d\, i?). Since k\ < fa , these equations have unique solu¬ 
tions bo, bi. Now hold 3>°, 0i, i? all fixed (0i ^ B°) and consider the graphs of 
the members of (36) as functions of <pl , From our definition of b 0 , bi, these 
graphs intersect at fci, /c 2 . But by hypothesis, the graph of the left member is 
everywhere concave up, and hence for ki < <j>l < k 2 , it lies below the linear 
graph of the right member, and for 4 >l < ki and <pl> k 2 , it lies above, That 
is (36) is true if and only if £ e ruo. 

5. Appendix on the moment problem. Easily applied criteria [8] are avail¬ 
able for the moment problem of assumption 5°(a). The moment problem 5°(b) 
is much more difficult, however, because the function to be determined by its 
moments is not of constant sign. Below wc offer a proof that the solutions of 
both problems 5°(o) and 5°(b) are unique in the important case where p(E | 0) 
is a multivariate normal p.d.f. and <t> i, <h, ,<tn are polynomials in xi, 
Xs, • • • , x„ of degree ^ 2 and not necessarily homogeneous. Since 9 is held 
fixed, we will not indicate dependence on 0, nor will the dependence of various 
functions on s be indicated, since s = 0 or else 1 throughout. 

Let Wi , Wi be any two regions, ay = P(w 3 ) ^ 0, for which the moments of 
Q,($ | wi) and Q,(4> | w 2 ) are the same. To prove the equality (almost every- 



ON THEOEY OF TESTING COMPOSITE HYPOTHESES 


291 


where) of these two functions it suffices to prove that their Fourier transforms 
are identical [7, theorem 61]. Suppressing the customary multiple of \Z 2 tt, the 
Fourier transform of Q a (<t> | w,) is 

¥»(t) = f •■■[** e"'* Q.(* | w,) d<j> 2 • • • dfr, 

0 —60 J—00 

where t is the vector (h , h, , ti) and + • • • + tfa . From (4) 

we get 

/ -Ho 

■ • ■ / e n ' 4 4 >[ p(<f> i, $ | wj) d<pi dfa ■ ■ • d<t>i 

■oo J—00 

= S(e« | w,) 

= -[ e**4i(#)p(je)dW. 

OtjJw, 

A device of Cram4r and Wold [8] for reducing the dimensionality of the 
problem now suggests itself. Let 2 be a scalar variable and consider \f/ t (z 11) = 
1 F J ( 2 t) for fixed t as a function of z. Obviously if for every fixed t, \p\(z 11) = 
^2(2 11), then SC'i(t) = W 2 (t), and we are through. We propose to prove the 
former equality by showing first that ypj is an analytic function of 2 for all real 2 
and secondly that the coefficients of the power series for and \p 2 in powers of 2 
are equal. Holding t fixed now, ( = t-$ is a polynomial of degree g 2, and 


(37) ^(2 It) = - f e iz( <t>[pdW. 

Otj J W) 

By our assumption of normality, 


p — C exp 



Q'ty Vk y* , 


Vi = X, — n,, 


where the matrix (a„) is positive definite. To prove the analyticity of \p, 
for any real 2 = zo, let 2 = zo + and restrict f to real values. Substitute 
in (37) 




-« + 

Q ! 


w 

m\ 




where | f m (H) | S 1. Then 

Mz* + r 1 1 ) = E f e< ’ o£ yti vdw + ruz 0 , r), 

C— 0 Q* CCj 


where 



292 


HENRY SCHEVF& 


and all integrands are absolutely integrable over W. Let a be the sphere of 
unit radius with center at On . w > * • * » Mn) io W and write 


R 


jm 


-mi+i " 

771 ", ctj •'t Oj — ujr _ 


Call the two terms of the right member 22, m and R 


// 

J m j 


Rj„ — R, m 4* 22, m ■ 


R\ m | % ~~~ f \iT4l\pdW. 

77l\ Otj ’Wf* 


Let M = max | £ |, Mi = max | |j for Eta. Then 


1 1 - I,P dW * I W mla *' 

Hence R\ m — > 0 for all real f as m —♦ «>. 


ICI * irrJ ir^lpd^. 

m! ai Jw-i 

Let r = ^23 VkJ , and M 2 , M 3 be the sums of the absolute values of the coeffi¬ 
cients of the polynomials <t>[ , f, respectively, when expanded in powers of y,. 
Then for E t W — a, \ <t>[ | Ss M 2 r, | ( | g Afar 1 , p g C exp ( — Xr 1 ), where 
X > 0 is the smallest characteristic root of (a,,). Hence 


R 


im | 


CMalM.f 1" 

m\ ay 


f ^rn+J 

"W—r 




dW. 


Integrating over spherical shells concentric with a , dW ~ AfV' 1-1 dr, and 


ICIS 


CMtMj dr £ CM,M4|M,rr 
ml ay Jx ml a/ 



If we evaluate the last integral in terms of a Gamma function and employ 
Stirling’s formula we easily find that for Jtfj | f | < X, R'/„ —* 0. The con¬ 
vergence of 22y, to zero for real f, | f | < \/Mt , is sufficient to insure the analy- 
ticity of y(/j . 

How let zo = 0. Then the coefficient of f* in the power series for f y is 

£«L (h * + '" + 

a linear combination (the same for j =» 1, 2) of the g-th order moments of 
Q.(* I wy), and hence corresponding coefficients for fa and fa are equal. 



ON THEORY OP TESTING COMPOSITE HYPOTHESES 


293 


REFERENCES 

[1] J. Neyman, “Sur la verification des hypotheses statistiques composes,” Bull. Soc. 

Math France , Vol. 63 (.1935), pp. 246-266. 

[2] J. Neyman, “Outline of a theory of statistical estimation based on the classical theory 

of probability,” Phil. Trans. Roy. Soc London, A, Vol. 236 (1937), pp. 333-380 

[3] J. Neyman, "On a statistical problem arising in routine analyses and in sampling in¬ 

spections of mass production,” Annals of Math. Slat , Vol. 12 (1941), pp 46-76. 

[4] J. Neyman and E S Pearson, “On the problem of the most efficient tests of statistical 

hypotheses,” Phil Trans. Roy. Soc London, A, Vol. 231 (1933), pp 289-337. 

[6] J. Neyman and E S Pearson, "Contributions to the theory of testing statistical 
hypotheses' Part I,” Stat. Res Mem., Vol. 1 (1936), pp 1-37. 

[6] J. Neyman and E. S Pearson, “Contributions to the theory of testing statistical 

hypotheses Part II,” Stat. Res Mem., Vol. 2 (1938), pp. 25-36. 

[7] S Bochner, Vorlesungen Uber Fouriersche Integrate, Leipzig, 1932 

[8] H. ChamJch and H. Wold, "Some theorems on distribution functions,” Jour. London 

Math. Soc., Vol. 11 (1936), pp 290-294. 



ON THE PROBLEM OF MULTIPLE MATCHING 

By I. L. Battin 

Drew University 

1, Introduction. The problem of determining the distribution of the number 
of “hits” or “matchings” under random matching of two decks of cards has 
received attention from a number of authors within the last few years. In 1934 
Chapman [2] considered pairings between two series of t elements each, and 
later [3] generalized the problem to series of u and L(< u ) elements respectively. 
In the same paper he also considered the distribution of the mean number of 
correct matchings resulting from n independent trials, and gave a method, and 
tables, for determining the significance of any obtained mean. In 1937 Bartlett 
[1] considered matchings of two decks of cards, using a number of interesting 
moment generating functions. In 1937 Huntington [12, 13] gave tables of 
probabilities for matchings between decks with the compositions (5 s ), (4 4 ), and 
(3 3 ), where (s') denotes a deck consisting of s of each of t kinds of cards. More 
generally (si8 s si) denotes Si cards of the first kind, 8i of the. second, etc. 
Sterne [16] has given the first four moments of the frequency distribution for 
the (5 s ) case and has fitted a Pearson Type I distribution function to the distri¬ 
bution. Sterne obtained his results by considering the probabilities in a 5 X 5 
contingency table. He also considered the 4 X 4 and 3X3 cases. In 1938 
Greville [7] gave a table of the exact probabilities for matchings between two 
decks of compositions (5 s ). Greenwood [4] obtained the variance of the distri¬ 
bution of hits for matchings between two decks having the respective composi¬ 
tions (s') and (Si8 t • ■ ■ si) with 8j. + a 2 + • • • a ( = st = n, and where it is 
not necessary that all the s's should be different from zero. Earlier Wilks [19] 
had considered the same problem for t = 5 and n = 25. 

In a very interesting paper Olds [15] in 1938 used permanents to express a 
moment generating function suitable for the problem in question He obtained 
factorial moments and the first four ordinary moments about the mean, first 
for two decks with composition (4 5 ), and then for two decks of composition (s'). 
In 1938 Stevens [17] considered a contingency table in connection with match¬ 
ings between two sets of n objects each, and gave the means, variances, and 
covariances of the single cell entries and various sub-totals of the cell entries. 
Stevens [18] also gave a treatment of the problem of matchings between two 
decks which was based on elementary considerations. In 1940 Greenwood [0] 
gave the first four moments of the distribution of hits between two decks of any 
composition whatever, generalizing the problem which had been treated earlier 
by Olds [15]. Finally in 1941, Greville [8] gave the exact distribution of hits 
for matchings between two decks of arbitrary composition. He also considered 
the problem from the standpoint of a contingency table, as had been done 
earlier by Stevens. 


294 



ON THE PROBLEM OF MULTIPLE MATCHING 


295 


In 1939 Kullback [14] considered matchings between two sequences obtained 
by drawing at random a single element in turn from each of n urns TJ X containing 
elements of r types E, in the respective propoitions p t] . He showed that if 
the process of drawing were indefinitely repeated the distribution of hits would 
be that of a Poisson senes. 

The work which has been done thus far applies to the problem of matching 
two decks of cards. In the present paper a method is developed for obtaining 
the moments of the distribution of hits for matchings between three or more 
decks of cards of arbitrary composition. 


2. Matchings between two Decks of cards In the present paper it will be 
convenient to take as the point of departure the method used by Wilks [19] 
in his treatment of the problem of hits occurring under random matching of two 
decks of 25 cards each, namely a target deck with composition (5 6 ) and a match¬ 
ing deck with composition (s t ), t = 1, 2, 3, • • • , 5, ^2 s, = 25 He showed that 


(1) 4> 


where, 



(xi e* + i) + 


+ a*) 5 (si + Xi e + % 3 + ■ ■ • + a^) 6 


• • ■ (xi + Xi + • • + Xi e 9 ) 5 



251 

Si!S2! • ■ • «bI ’ 


is a suitable generating function for obtaining the moments of the distribution. 
In fact, if we define an operator K , „ as 


(2) K„,,, . H u = coefficient of s^arj 1 • • xl* in u, 

where u = u(xi, , • ■ • , x 6 ), and if h denotes the number of hits, then for 

r = 1, 2, • • • , 5, 

(3) P(h = r) = coefficient of e r6 in K tl>1 . , t <f> 


And it is readily seen that 


(4) 


m p ) = k. ih „ ss ^ 


J-0 


Wilks' <j> function involves a particular order for the target deck. If we are to 
generalize and obtain moments for matchings between more than two decks, 
it is obvious that we must devise a procedure which will, in the case of two 
decks, be perfectly symmetrical and not require that one deck be given a pre¬ 
ferred status. In the case of two decks this is readily accomplished by the use 
of Kronecker deltas, and in the case of three or more decks by the use of obvious 
generalizations of these deltas. 



296 


I. h. BATTIN 


For two decks of 25 cards each with compositions (5 s ) we need only let 
(5) <f> = (a y; ss ^ 


where 5,< = 1; 8,-j = 0, i ^ j. 
Then, if 


(6) K nn » lt . .. nii u « coefficient of i," 1 l xj" 1 s 

where u - u{xi , xj , ■ , x t , y t , yi , , yi), it readily follows that 


xV'yV'vi" 


yl" in n 


(7) 


v 3 

E(h”) = K*™*-™** dS p 


K, 


M4W-&WM 4> 9-0 


More generally, for two decks of n cards each, the cards being of fc types, and 
the decks having compositions (n u , n n , • • • , Tin), (nn , n«, , Thk) respec¬ 

tively, we let 

(8) <t> = u” = (m, y,e i,i> ) n 53 z* Vte li, *\ . 


The factors of are in one-to-one correspondence with the n events of dealing 
a card from each of the two decks. The values which can be assumed by the 
subscripts i and j are in one-to-one correspondence with the k types of cards. 
The symbol xi corresponds to the first deck, yj to the second, the subscripts i 
and j corresponding to the different types of cards in each deck. The expansion 
of <t> consists of all products which can be formed by choosing one and only one 
pair x a yfi from each factor of <t> as a factor of the product. In forming any term 
of <t>, choosing x a y„ from any factor of <p corresponds to dealing a card of type a 
from both decks, and introduces c* into the coefficient of the term. Choosing 
x„Vp from any factor corresponds to dealing a card of type a from the first, 
deck, /3 from the second. If a ^ /S, then, since 5u — 0, i j* 5 j, e 9 is not introduced 
into the coefficient. Therefore in the coefficient of any term of 4>, e will be 
raised to a power, say s, which is equal to the number of factors of 4> from which 
pairs x a y a have been chosen. 

The total number of ways in which the term 


can arise is equal to the number of ways in which two decks of types {nu), (nay) 
respectively can be dealt, (where (n u ) w (n a nij ■ ■ • n u ) and similarly for (n*y)). 
But tins is given by 


An ll nij ..nii,-n|in J ,...nn <^> |»-fl — Kn, inir-'ii*.nj,rti|‘. D|) ^ ^ 



ON THE PROBLEM OP MULTIPLE MATCHING 


297 


The coefficient of e' 6 in ff n , 17lll „ u .„ 2ini2 ., is the total number of ways m 
which the term a? “a;? 12 ■ • • a;" 1 *2/" 11 2/2 22 ■ • ■ y£* k can be formed subject to the 
restriction that pairs x,y } with i = j are chosen from s of the factors of <j>. But 
this is precisely the number of ways in which the two decks can be dealt so that 
there will be s hits. Hence if, as above, h is the number of hits, the probability 
that h = s, assuming all permutations in each deck to be equally likely, is 
given by 

coefficient of e‘ 6 in K n . 


( 10 ) 


P(h = s) = 


*11 n 12 "22 ' n 2 k 


4> 


Kn linit .. nu.njiitjj 0 

Since this is true for all values of s it follows that 


(ID 

Since 


E(h>) = 


K ni 




d p <f> 
66* 


K nil n lt <f> |fl —0 


6<j> 

66 


= nu n ~ l F £ s., x, y, e ! ”'l = n [*£ x < Vi «”' 1 

J l»-» L»-i J 

1/v V"7v V 1 

-’•LSHlS*') (S»7 




we have at once 


E(h) = 


n 


k 


n n 

.ni.J l_n 2 ,_ 


n u n n n i 


fy* V 1 

• »i,l ij Xi I 


(12) 


= — — n v r— 

n n fzi L n n ! 
_ni,J \_ihj_ 


( k y-i 

(n ~ 1)1 _] 

i! • • ■ riiiclj 


ni,_il(7ii, — 1) 1 n- 1 ,+ 1 ! 


_ (n ~ D1 

_n*l ••• n*_il(nj, - l^n*^! 




flu 


Z n 

i-i n 


It is an equally straightforward matter to show that 


h*) = £ 


(13) E(h') 
and that 

(14) A = £ 


+ 


Hi,(nu- — l)n*.(w* - 1) 
«(h — 1) 


■1 + E 

j •t'j 


flu Hi, rn, fit, 

n(n — 1) 




, Hu A v Htifhi 

n t n a) J ^ n 2 ( n _ i) ■ 


T 1 L H 



298 


I. L. BATT1N 


k 

Evidently any of the n w and Th, may be zero, provided only that E n u — 

l-l 

k 

E Th , = n. The case of two decks with unequal numbers of cards m and n, 

,-i 

(m < n), is readily handled by substituting for the smaller deck one obtained 
by adding n-m “blank” cards—that is, cards of any type not already appearing 
in either deck, as indicated by Greville {8], who however obtained his results 
by considering a preferred order for one of the decks. 

Example 1. In the case of the decks treated by Wilks [19], n — 25, k — 5, 
nu — n t] = 5, Hence from (12) 


«« = £{-}=■ 5, 

and from (14) 

* _ V /5 5 25-25 5-4-5-4\ , ^ 5-S-5-5 
° h &\25~ (25) 1 ‘ r 25.24 j ^ fa (25) 2 24 



Example 2. Suppose we have two decks as shown by the scheme 


Type of card Total of all types 

1 2 3 4 5 

No. in deck A 
No. in deck B 


5 7 8 0 0 20 

0 3 4 6 2 15 


Here deck B has five fewer cards than deck A. Hence we must presume that 
there are six types of cards in all, and that the decks have the respective distribu¬ 
tions (578000) and (034625). We then have at once 

Eih) = E— = ^[0 + 3-7 + 4-8 + 0 + 0+ 0] 

,-i 71 20 

= 2.65 

2 _ v / n »ns, _ riurhi L riiVnlVl , v nunanijih/ 

^ l=i \ n n ! ' n< 2 > / + n\n - 1) 

= 2.65 - ± {3 a -7* + 4 a .8 2 ) + ^ (3-2.7-6 + 4-3 8*7} 

+ —-— 

400.19 


(3-7 4-8 + 4-8-3-7j 



ON THE PROBLEM OF MULTIPLE MATCHING 


299 


3. Matchings between three decks. Let the three decks be of types 
(n 1 i 7 z 12 • ■ • (n 2 in 22 ■ • • n 2g ), (?i 3 in 32 • • • n 3q ) respectively, with 23 = 

1-1 

0 <2 

^2 iii, = 23 fts* = n, and consider the function 

j-i i-i 

(15) 0 = [ 23 x t y,z k e s ' lkht3+s ' lSli+s ' k>11+s ^ h 3 ] = u n , 

Li. 3.1-1 J 

where 

(16) = 1 , 6 tJ k — 0 i, j, k not all equal, 

and the other deltas are the usual Kroneeker symbols 

Each factor of <$> corresponds to one deal from each of the three decks. The 
symbols x, y, and z correspond respectively to cards in the first, second, and 
third decks. The subscripts i, j, k, = 1,2, *•*,<? correspond to the types of 
cards—there being q distinct types. 

Choosing x a y a z a from a factor of <j> corresponds to a deal in which a card of 
type a is dealt from all three decks, and introduces e Bl 2 s+9 ” +e >» +8 “ into the coef¬ 
ficient of the corresponding term in the expansion of 0 Similarly, choosing 
x a y a z B , /3 a, corresponds to a hit between the first and second decks, and 
introduces e" u into the coefficient Similarly choosing x a y B z a introduces e $n ; 
x B y a z a introduces e" 2 ’. Choosing x a y B z y , a & j 8 y* y 7 * a corresponds to a deal 
with no hits, and introduces no powers of e into the coefficient, since all the S’s 
are zero. 

Let K nit n ,/.» u be defined by 

(17) K nu . n „.n lk u m coefficient of x?" ... atfV 11 • • ■ ■••*?'* in u. 

Then the coefficient of e kll,,M ' in K nix . nik <j> | 8 l 2 = 0 , 3 »(( 2 ,,=« is the number of ways 
in which the cards carl be dealt so as to yield precisely hm triples, or hitsbetweep. 
all three decks. Similarly the coefficient of e * 12 ® 12 in K„ u „ 2) „, t 0 | 8l2 _ 8l3 = 821 „o 
is the number of ways in which the cards can be dealt so as to yield precisely hn 
hits between the first and second decks, with corresponding results for the first 
and third (h i3 ) and second and third (foj) decks. 

By the same reasoning as before then, we have 


( 18 ) 


(19) - 


And it is a straightforward matter to 


E(h\ u ) 


j? 

^ n '< n *i "»* ddlx 


E(hli) = 


Kn U 'nn "3*0 I 9 '* -0 
d r <t> 

K ” 1 ' ni >' n3k ddi2 


: 6 ’S -0 


Kn u n 2 , njjt 0 | S'.e —0 


with similar results for h i3 and h 23 . 
show that 



300 


X. L. BATTIN 


(20) B(k„) = n £ ( n —) 

t“l \ft«l 71 J 


»(»-« 2: (n^'Y 

i.J-UW) \a-l n< 2) } 


(22) E(hn) — ~ 


(23) E(W = ~ £ 


n“ „t-i 


Tin MiTlik 


(24) fi(A«) = i J 


2 ^ Tljj 

n 2 


E(hlt) - -- D nunti 7hk + -j?—-—^r 
n 2 j,* n J (n — 1) 

(25) + £ 


2 _, nu rhi 

L »a 


m (i) 

T. Xlj* 


2 nu ni <’ nj* nir + X) m, na r^i niV 

i, kftr k, \r*l 


+ 2 7iuWiin«nun«tn| r 

MI.M' J 


with corresponding results for other moments. It is understood each summation 
index takes values from 1 to q 

As before, if the decks do not all have the same total number of cards it is 
merely necessary to introduce one or more sets of "blank” cards. Thus we 
would replace decks with the compositions (57800), (03462), (00335) by hypo¬ 
thetical decks (5780000), (0346250), (0033509) and proceed as before. 

Example 3. For three decks of 25 cards, consisting of five of each of five 
kinds we have n = 25, n ot = 5, a = 1, 2, 3, * = 1, 2, ■ ■ • , 5. Hence 

«(*») = 25 t A ^ = 1 

,*-1 «*-! 40 

**■> “ 26 5 (0 + 26 - 24 & (Mi)' + 26 ^ £ Gra)’ 


* 47 

<r * ,n 48’ 

S(w -P)i^ 6 ' 



ON THE PROBLEM OF MULTIPLE MATCHING 


301 


E ^ hli) (25) 2 £ 5 + (25) 2 (24) 2 [£ 5 4 + .£ 5442 

Mr 

+ Z 5*4+ E 5*] 


kl*T 


- 28*. 

CT Aia = 4$. 

with similar results for jf?(ft M ), 2 ?(&m), cj,,, and or*,, . 


4. Generalization to any number of decks. If the moments of the distribu¬ 
tion of hits—doubles, triples, quadruples, . . —in matching any number of 
decks is desired, these can be obtained by using an obvious generalization of 
(15). Thus for four decks we would define 5j jtv = 1, S llkl = 0, i, j, k, l not all 
equal, and use 

(26) <j> = ^ ' +i, ' #ia+ •■+ j h<*s4 

Lm.m-i 

However, it is evident that as the number of decks is increased the summations 
involved and the manipulation of the (generalized) K operators rapidly become 
complicated. 


5. Application of our moment-generating technique to two-way contingency 
tables. The moment-generating technique which we have discussed has wider 
applications than merely to matching problems. As an example of considerable 
interest we shall consider the contingency problem. Consider the array 


(27) 


n al 9 . n a , 
n.f> n 


and also the function 


“ = 1 , 2 , ••• , r 

P = 1 . 2 , 


2^ ri a . 

a,0 a 



n 


(28) 


<t> = n (a*« s “T- = ri (Ea*S") ’• 

a“"l a“l vi«-l / 


If i and j are particular values of a. and (3 respectively, then to the f-th row 
of the array corresponds the product consisting of n„ identical 

factors Xpc s ' $ , one such factor corresponding to each of the n,-. elements in the 
row. To the j-th column of the array corresponds the a:, which appears in each 
of the factors of <f>. To the ij-th. cell of the array corresponds e'' which appears 
only in the factors {x§e l,f Y" , and in each of these only as the coefficient of x, . 



302 


I. L. BATTIN 


The expansion of <j> consists of all products which can be formed by taking as 
factors one and only one element x g e al> (not summed) from each factor of <j>. 
But taking xfi" from one of the factors (x g e' t> ) n " of <j> corresponds exactly to 
putting an element in the ij -th cell of a lattice such as (27). Hence every term 
in the expansion of <f> corresponds to a particular distribution in such a lattice. 
Moreover, all terms of <f> correspond to distributions in which the row totals 
are n a ., for we must take n„ elements from the product . Further, 

those terms in which the x g appear in the particular product '• ■ ■ ■ 

correspond to distributions in which the column totals are n.i, n. 2) • • • , n.,, 
since choosing n.j elements x j e“ i corresponds to putting n. ; elements in the 
j-th column and some row of the lattice. 

Expanding <f> we obtain 

(29) ♦-••.+[£ n [;■;] <3 xr- 


where the summation is over all partitions (n aI n a2 • • n„.) of the n„. such that 
{nifirhfi • ■ • n r f) is also a partition of n.$ . It is clear that since every set of 
values of the n a p subject to the partition restrictions Yh — n °- ■ Y', n uf) = n. 3 

& a 

corresponds to a particular distribution of n elements in the lattice (27), every 


particular product 



corresponds to such a distribution, and represents 


the number of ways in which it can arise. Further, the total coefficient dis¬ 
played (29), namely ^ XI n “‘ , represents the total number of ways in which 
a-i L n «/sJ 

distributions with row totals n a , and column totals n. g can arise. Setting all the 
= Owe readily find 


(30) 



n .,(ii + %+•■•+ x.y 


Hence the probability of any particular distribution || n a p 11 with fixed row totals 
n„. and fixed column totals n, g is 


(31) 


F(|| nap || | n „., n. g ) 



Moments of the n, y, Consider now the result of differentiating <p with respect 
to a particular 6 ag , say d tJ . We obtain 

2 fla/jOafl 


(32) 


df> 


4 " 2 ° 


n n a . 1 S 

a 


n.i n. 5 
X \ 


*:*■ + 



ON THE PROBLEM OF MULTIPLE MATCHING 


303 


where denotes summation over indices such that 2 n «(> = > 

n “i + n tt — K'i (d 5^ j). Now n,, < min (n Im , n.,), but also n t , can never be 
less than n., — (n — n„). For n., = n, 3 - + 2 • Since the maximum value 

QCJ^I _ 

of Uai < n a . , the maximum value of 2 n a] < £ n a .. Hence 

a^i 

«.•; — n., — ^2 n aj > n., — n a . = n., — {n — n t .). 

af ^\ d y &\ 

Therefore 

max (0, n., — n -j- n„) < m, < min (n„ , w.,). 

Accordingly, combinmg all the terms of (32) in which n x] has a particular value, 
7 , we have 

dtp 

Mi, 

(33) 


+ 


min. ( n > ,n,j) 


7 «“* max (0, 


w» »n.i) r “1 

E * E* II* *“■ 

S “ L^o/U 


S **afl®cc(S 

■e°.» xf-'x ?- 1 ••• . 

where 2* denotes summation and II* multiplication with n tj = 7 . 

Since X)* II* is precisely the number of distributions |ln a fl|| for 

S». r .“ L n «<U 


which nij = 7, it follows that 


(34) 


E(.n„ | n „., n. 3 ) = 


Similarly it follows that 


(35) 

(36) 


E(nf, | n a ., n.p) = 


E{n^nh | n a ., n.p) = 


fc] 

[tr 

1 . 

r—n j 

n 


dtp 



fl a p*“0 

d?<P 


"■* aaf, 


<3^+o <?l> [ 


ee*, del 


8 a p—Q 


where we may have 1 = i ori ^ 1 , and j = l or j l. 

By straightforward differentiation and reduction we find that for the array 
(27) with given marginal totals n a , , n.p 


(37) 


E(n t ,) = 


n u n., 
n 


zrr i _ n t n ^-i _l n -i 
E ^> —+ — 


(38) 


n 



304 


T. L. BATON 


2 _ [n 2 — n(n„ -+- n.,) + m. n.,]n,. n., 

n 2 (n _ ^ 

„ v < 3 ) n w n w n m n n 
E(nl,) = “i- + 3 n -^- + —’ 
n t8) n m n 


n W n (*) n <3) n t3) n (2) n (i) nn- 

+ 6 + 7 ^ > 


and if i and k, j and Z are distinct. 

rAn\ s «% ni? , »i! ) n. t J > , / < 2 ) , , n„7i*.n. C2) 

(42) £(n„ 7 i*,) =-+ (nl.'n,. + n,.nl. } ) ^ 

t?r ^ 2 > njV n. (2> n[V , , w , nW , nf? n.,n.i 

(43) JS(n„n„) = -^-+ (ni/n .1 + ^i 5 )^ + — 

^ n^nrniM!' , nf) n k .n^ n A 
B(n.,n H )= -^^- 

( 44 ) (>, (2) 

, ni.nl. n„n.i , n,.7u.n./n,, 

+ n a) n«> 

Moments of the distribution of Chi Square. For the array (27) 


L, _ 5-£*Y 

x s -2^ - 

M n a .n.fi 


= Zr-~nl ? -2a 0 ,4-^l. 

ot,0 L^«. ft*0 ft J 


Hence, using the above results we can, theoretically, find all the moments of the 
exact distribution of x 2 . It is not difficult to show that 


(46) 2?(X 2 ) = —-~r (r — l)(< - l). 

n — 1 

The value of F[(x 2 ) 2 ] and the variance of x* were found by straightforward 
application of our methods and the results agreed with those given by 
Haldane [10]. 

The writer is indebted to Professor Wilks for helpful criticisms and suggestions. 


REFERENCES 

[1] M 8. Bartlett, ‘‘Properties of sufficiency and statistical tests,” Proc, Roy. Soo., A, 

Vol. 160 (1937), pp. 298-282. 

[2] Dwight Chapman, ‘‘The statistics of the method of correot matchings,” Amer. Jour. 

Psych., Vol. 46 (1934), pp. 287-298 

[3] Dwight Chapman, ‘‘The generalized problem of correct matchings,” Annals of Math. 

Slat., Vol. 6 (1935), pp 85-95. 



ON THE PROBLEM OF MULTIPLE MATCHING 


305 


[4] J, A Greenwood, “Variance of a general matching problem,” Annals oj Math, Slat., 

Vol, 9 (1938), pp 56-59, 

[5] J. A Greenwood, “Variance of the ESP call seiies,” Jour, of Parapsychology Vol 2 

(1938), pp. 60-64 

[6] J A. Greenwood, “The first foui moments of a general matching problem, 11 Annals 

of Bug., Vol 10 (1940), pp. 290-292. 

[7] T. N. E, Greville, “Exact probabilities for the matching hypothesis,” Jour, of 

Parapsychology, Vol. 2 (1938), pp. 55-59 

[8] T. N. E. Greville, “The frequency distribution of a general matching problem,” 

Annals of Math. Slat., Vol. 12 (1941), pp 350-354 

[9] J. B S Haldane, “The mean and variance of Chi square, when used ati a test of 

homogeneity, when expectations are small," Biametnka , Vol 31 (1940) 

[10] J. B S, Haldane, “The first six momentB of Chi-square for an n-fold table with n 

degrees of freedom when some expectations are small,” Biometnka , Vol 29 
(1939), p. 389 

[11] J, B. S Haldane, "The exact value of the momenta of the distribution of chi-squaie, 

used as a test of goodness of fit, when the expectations aie small,” Biometnka, 
Vol. 29 (1939), p. 133. 

[12] E. V. Huntington, “A rating table for card matching experiments,” Jour, of Para¬ 

psychology, Vol, 4 (1937), pp 292-294 

[13] E V Huntington, "Exact probabilities in certain card-matching pioblcms,” Science, 

Vol. 86 (1937), pp, 499-500 

[14] Solomon Kullback, "Note on a matching problem,” Annals of Math Slat , Vol 10 

(1939), pp. 77-80. 

[15] E G. Olds, "A moment-generating function which is useful in solving certain matching 

problems,” Bull Ainer, Math, Soc,, Vol. 44 (1938), pp. 407-413. 

[16] T. E. Sterne, "The solution of a problem in probability," Science , Vol 86,(1937) 

pp 500-501, 

[17] W L Stevens, "Distubution of entries in a contingency table,” Annals of Eugenics, 

Vol, 8 (1938), pp. 238-244. 

[18] W. L. Stevens, “Tests of significance for extra sensory perception data,” Psycho¬ 

logical Review, Vol. 46 (1938), pp 142-150. 

[19] S. S. Wilks, "Statistical aspects of experiments in telepathy,” a lecture delivered to 

the Galois Institute of Mathematics, Long Island University, December 4,1937 



ON THE CHOICE OF THE NUMBER OF CLASS INTERVALS IN THE 
APPLICATION OF THE CHI SQUARE TEST 

By H. B. Mann and A. Wald 1 
Columbia University 


Introduction. To tost whether a sample has been drawn from a population 
with a specified probability distribution, the range of the variable is divided 
into a number of class intervals and the statistic, 


( 1 ) 


V ( a > ~~ Npi) 2 
h nj>< 


2 


x, 


computed. In (1) fc is the number of class intervals, a, the number of observa¬ 
tions in the tth class, p, the probability that an observation falls into the z'th 
class (calculated under the hypothesis to be tested). It is known that under 
the null hypothesis (hypothesis to be tested) the statistic (I) has asymptotically 
the chi-square distribution with k — I degrees of freedom, when each Npt is 
large. To test the null hypothesis the upper tail of the chi-square distribution 
is used as a critical region. 

In the literature only rules of thumb are found as to the choice, of the number 
and lengths of the class intervals. It is the purpose of this paper to formulate 
principles for this choice and to determine the number and lengths of the class 
intervals according to these principles. 

If a choice is made as to the number of class intervals it is always possible to 
find alternative hypotheses with class probabilities equal to the class probabilities 
under the null hypothesis. The least upper bound of the '‘distances" of such 
alternative distributions from the null hypothesis distribution can evidently be 
minimized by making the class probabilities under the null hypothesis equal to 
each other. By the distance of two distribution functions we mean the least 
upper bound of the absolute value of the difference of the two cumulative 
distribution functions We have therefore based this paper on a procedure by 
which the lengths of the class intervals are determined so that the probability 
of each class under the null hypothesis is equal to 1/k where k is the number of 
class intervals.* 

Let C(A) be the class of alternative distributions with a distance ^ A from the 
null hypothesis. Let /(IV, fc, A) be the greatest lower bound of the power of the 
chi-square test with sample size N and number of class intervals k with respect 
to alternatives in C(A). The maximum of /(AT, k, A) with respect to fc is a 
function MN, A) of N and A. It is most desirable to maximize f{N, fc, A) for 


‘Research under a grant in aid from the Carnegie Corporation of New York 
2 This procedure was first used by H Hotelling. “The consistency end ultimate dis¬ 
tribution of optimum statistics," Trans, Am. Math, Soc , Vol. 32, pp 861.) It has been 
advocated by K. J, Gumbel in ft paper which will appear shortly. 

306 



CLASS INTERVALS IN CHI SQUARE TEST 


309 


We now assume that TV is so large that the joint distribution of the 2 , is suffi¬ 
ciently well approximated by a multivariate normal distribution. Then 

E(zb,) = 0, E(z\) = 3 [E(z\)}\ E(z\z]) = E{z\)E{z]) + 2{E(z iZl )f for i ^ j. 

We have the well known relations 

E(z]) = E( a ]) - N*p\ = Np,( 1 — p,), 

E(z t z,) = E(ae,at } ) — N 2 p,p, = —Np t p 3 . 

Using the above equations we obtain 

( I-it \2 / i -k \2 

S4( E 5 2 -) 

= IV 2 j3 £pi(l - P.) 2 + 2 [Pip/l - p»)(l - P,) + 2 p 2 ,p 2 ] - ££ P,(l - Pi) J | 
= 21V 2 P 2 (l ~ P.) 2 + 2 p\ P 2 J 

t i-fc >-t /i-fc \2T 

§rf-2gp: + (grf)J. 

Further 

S (2 = E (2 2 2 v 2 ) + E (x; z t z jVi v)j 

- JV' [g p, (1 - p.)(p. - 1) - g P. p, (p. - l)(p, - 0] 

-HS4-ff-B54-9II 

-iv[gp:-(Zp:) ! ]. 

Substituting this into the formula for <4 > we finally obtain 

(4) 4* = 2fc 2 {2 v\ + 2(W - 1) 2 p* - (2 N - 1)(2 v\ J} • 

2. The Taylor expansion of the power. Let C be determined so that the 
probability under the null hypothesis that ^ x\ ^ C is equal to the size Xo of 



310 


H. B. MANN AND A. WALD 


the critical region. Let P 


(p, > o) 


be the probability under the alternative 


hypothesis that 22 x) ^ C, Then the power P is given by 

1=1 


(5) 


where 

N 

k 


ft 

Hence 

V * * /v* * N T 

and (5) can be written in the form 

(6) 

p(p,*c) 


where C' is a certain function of N and k. Let 5, = p, — ~, where pi is the 

K 

probability of the ith class interval under the alternative hypothesis. 

Expanding P into a power series we obtain (in this and the following deriva¬ 
tions, we take all partial differential quotients at the point 5i = = • • • = 

h ~ 0 ) 


mi o8% 2 dSi 


d l p 

as, 65, 


+ 


Since P is a symmetric function of the S , we have for Si = 5 a = • • = 5* = 0 


d 7 p _ a 2 p 

6*P 

d 2 P 

6S\ 6S\ ’ 

65,68, 

6Si 6Si 

Furthermore 22 s < = 0. Therefore 


+ ' 

?.«■} 

2 \d8\ 7=t 

65i 68 2 t 


+ 


for i j. 


We shall first show that the terms of second order are always positive. This 
shows that the test is unbiased and justifies again the choice of equal class 
probabilities under the null hypothesis since this assures unbiasedness and mini- 



CLASS INTERVALS IN CHI SQUARE TEST 


311 


mizes among all unbiased tests the g.l.b. of the distances of such alternatives 
whose power is equal to the size of the critical region. 

The power is given by 


P = 


Nl 


,pt'P2 l ' ' Vk k - 


t'Hi 


(7) 


«fe 

► 

t-1 


aj+aj+. "iW • ■ • a*!' 

Since 52 — — 52 8 / we obtain for the second order terms 

l""l 

I 2 p \ ss 

(ai — qi — «! a 2 )p(ai, a* • • ■ a*) 52 $5 

where 

P(«i »••’«*)=* , 7-. V„. 

ailct2l • •« ov. /c" 

In the following derivation extend all sums if not otherwise stated over all 

i— i —A: 

terms for which 52 ^ C 1 and use the relation 52 «. — N. We have because 

i-i t-i 

of the symmetry 


V ** ^ v * * _ v' 

08* i=i \35i 3fi! 

52 

aj+«j+ +aj£C' 


W! 


52 aip(«i, «2, ^ 52 P(<*i, «2 • • • <**) = ^-Xo, 

52 a i «2p(«i 1 a 2 J ■ ■' «*) = ^ 52 ( 'n 2 — 52 p(a i, am, ■ ■ ■ at) 

N ha 1 -r~> 2 / \ 

~ kQc^T) k^l ^ ai ^ ai > a3 > ■■’“*)• 


Hence the coefficient of the second order term becomes 

_ _ > m 1 /Vj A.- • . VUi ) — \ . 

k 


4~i 52«?»(«»«». •-«*> -f x °- fe (r -i) 


= f—— ; 52 52«t p(ai , 02, ■ ■ ■ a*) — y Xo — 

/C 1 


iV 2 


fc fc(fc - i) 


Ao. 


But 


i-k 


52 52 a? p( a i, “2 i * • ■ oik) 


Aq 


> E 


@4 


t—* 


since the conditional mean for values of 52 ot\ ^ C must be larger than the 



312 


H. B. MANN AND A. WALD 


mean 


I—1 /*—1 \ Jy2 ^y 

of all values of 53 a i ■ Since E ( 2 “■ ) = --p + W, we obtain 

i-i V~i / k k 

at.) 


1 


t-i 


,-; SD a,p{a i, a 2 

fc — 1 1-1 

^ Xo /W 3 L - 1T\ _ , / w* , N\ 
> k~l\k + k ) Xa \k(k-l) + k) 

ip-A: 

and hence the coefficient of 53 is larger than 0. 

i-i 

To prove Theorem 1, we will have to determine the alternative distribution 
for which 53 fi * becomes a minimum subject to the condition that the distance 

i-i 

from the null hypothesis should be greater than or equal to a given A. 

Hence we have to find a distribution function F(x) such that | F(x) — x I ^ A 

i—k i2 i—k j 

for at least one value x and 53 5? = 53 (P> — r) = 53 P*‘ — r is a minimum 

i-i i—i \ k/ i—i k 

where p, = K*) _f (V)- lMteldot minimizing 53 b\ we may minimize 

i—Jb 

53 P< i since the two expressions differ merely by a constant. There will be two 

»"-l 

different solutions for F(x) depending on whether F(x) — x ^ A or F(x) — x < 
— A for at least one value x. Because of symmetry we restrict ourselves to the 
case in which F(x ) — x ^ A for at least one value of x. 

Let a be a value for which F(a) — a ^ A and Buppose that 

Z - 1 , l 

— <ai i 


then 


We prove first 


F{a) ^ a + A, 

KK- 

e > A — i. 


Proof: Since — F(a) ^ 0 we have 

F (J^ = F(a) + f(?) - F(a) } a + A 



CLASS INTERVALS IN CHI SQUARE TEST 


307 


such values of A for which $(W, A) is neither too large nor too small and in this 
paper we propose to determine A so that $(JV, A) is equal to §. 

Hence we introduce the following definitions: 

Definition 1. A positive integer k is called best with respect to the number of 
observations N if there exists a A such that f(N, k, A) = % and f(N, k', A) < * 
for any positive integer k ', 

Definition 2 A positive integer k is called e-best (0 ^ e <1 1) with respect to 
the number of observations N if e is the smallest number m the interval [0, 1] for 
which the following condition is fulfilled. There exists a A such that f(N, k, A) ^ 
i — f andf(N, k', A ) < % + e/or any positive integer k'. 

It is obvious that an e-best k is a best k if e = 0. If e is very small an e-best 
k is for all practical purposes equivalent to a best k. 

Since f(N, k, A) is a continuous function of A it is easy to see that for any 
pair of positive integers k and N theie exists exactly one value e such that k is 
e-best with respect to the number of observations N. Since the value of this e 
is a function of k and N we will denote it by e(k, N). 

Definition 3. A sequence {fc Y } of positive integers is called best m the limit if 
lim e(krf , N) = 0. 

jV—oo 


In this paper the following theorem is proved: 

6 /2(N — l) 2 

Theorem 1 . Let ky — 4 —- -—— where c is determined so that 


i r 

J e~* 2/2 dx is equal to the size of the critical region (probability of the critical 

region under the null hypothesis ) then the sequence is best in the limit. 

5 4 

Furthermore lim f{N, ky , Ay) = \ for A N ---y. 

ky ky 


It is further shown that for N ^ 450, if the 5% level of significance is used, 
and for N ^ 300, if the 1 % level of significance is used, the value of t{ky , N ) 
is small so that for practical purposes ky can be considered as a best fc. The 
authors are convinced although no rigorous proof has been given that t(ky , N ) 
is quite small for N ^ 200 and is very likely to be small even for considerably 
lower values of N. 


1. Mean value and standard deviation of the statistic under alternative hy¬ 
potheses. It is well known that every continuous distribution can by a simple 
transformation be transformed into a rectangular distribution with range [0, 1]. 
We may therefore for convenience assume that the hypothesis to be tested is 
that of a rectangular distribution with the range [0,1]. Moreover as mentioned 
earlier we assume that a procedure is chosen by which the class probabilities 
under the null hypothesis are equal to each other. 

The statistic whose mean value and standard deviation is to be determined is 

2 *1 - x” "here *. - ~ j) ■ 



308 


H, B. MANN AND A. WALD 


Let p, be the probability under the alternative hypothesis that one observa¬ 
tion will fall into the ith class. The probability of obtaining certain specified 
values ofj, a t , • • • , a* is given by 


/(aj, ■ ot h ) = 


N\ 


a\ 1 at ! «• ■ a* 1 


p?‘p"’ • 


K*. 


t -=>fc 

Since 52 <*< — N we have 

»-i 


,„i iv 


We consider the function 


(pie' 1 + p,«'» + ■ • • p*")" = S/( Ql a,)c° l ‘ 1+aiIl+ ”' a *'‘. 

Differentiating twice and then setting Z, = 0 for i = 1, 2, - • • k we obtain 

(2) N(N - l)p J < + Np< = £(«?), W(AT - l)ptp, = £(<*,*,) for t * j. 
Hence 

E (g a?) = “ 1) g v\ + 

and 

(3) E{x' % ) = *(tf - 1) 2 P* + h - N, 

1-1 

To compute the standard deviation of x' 2 we put 


* - ( Wl> - t) Vw m 




CLASS INTERVALS IN CHI SQUARE TEST 


313 


and 






1 1 
If A < - we can always find a distribution function in C{ A) for which p, = 

1 »—k 

Hence we consider only the case k > We must minimize Y p\ under the 

, , A «-i 

V' Z A; — l 

condition Yp* — ^ + e , Y P<- — — r .— ~ *• We therefore minimize 




t-i+i 
1—fc 


k 

i-i 


i-fc 


$ = Y Pi ~ 2X x 2 P. - 2X 2 23 p,. 


l-l 


1-1+1 


This leads to 


Pi = 


i + 1 for 

for i = (Z + 1), 


k k-l 


We then have 

i-* 




This is smallest if e = A — - and l = The following discontinuous disltribu- 

AG Z 


tion function gives these values for t, l and p, and has the distance A from the 

rectangular distribution. 



«x)-*[l + 2 ( 4 -0] 

for 

0 $ x $ - - 
x v 2 k’ 


for 

1 1 .1 
2-k <X ^2’ 


j for 


F(x) = 0 

for 

0 ^ x, 

F(x) = 1 

for 

x £ 1. 


3. Solution for large N. Denote by F(A, k) the distribution function (8) of 
C(A) which makes Y, 6° a minimum if the test is made with k class intervals. 

i-i 

Assume that k is large enough that x' 2 can be taken as normally distributed. 
The power of the test is then given by 



314 


H. B. MANN AND A. WALD 


(9) 


V2 


L-J 

2ir o’ Jo 


a-D hV«(«-D 


,/f x a 

\l*»l / 

_ JL r /t _ t y. ^ > 

\/2ir 'k-l~x f 2 ** j-tev'sct-l) 


dy, 


where a is the standard deviation of 22 and c is determined so that 

»-i 

1 f°° 

I e - ^ 5 dy is equal to the size of the ciitical region. Hence to maximize 
the power with respect to k is equivalent to maximizing 


m = 


E (j2 x\) — (k — 1) — c V2(fc - 1) 


with respect to k. 

Under the alternative F( A, fc) we obtain 
E (E - (k - 1) = k(N - 1) 2 v\ + k ~ N - k + 1 = i(N - 1 )(a - 
Hence 


+(k) = 4(JV - 1) 


R)‘ 


cVi(k- lj 


We choose A so that this maximum power is exactly that is, so that i'(k) = 0 
for that k which maximizes i^( k ). Denote this value of A by Ay and let fcy be 
the value of k which maximizes <p(k). The differential-quotient of the nume¬ 
rator of t{k) with respect to k is then equal to 0 for k = kn , Hence 


( 10 ) 



l_\ 1_ = c 

kj k% y/2(k„ - 1) ‘ 


Furthermore since 'P(k^) = 0 we have 

(ID m - 1) (a* - i-J = cV2(k^=l). 

Solving equations (10) and (11) we obtain 


( 12 ) 


Ay = 



z.* 

k n 



= 4 


* M*f -- 1)« 


and 



CLASS INTERVALS IN CHI SQUARE TEST 


315 


or since k N > 3, 




Hence 

(13) either k. 


* - [‘ 


or 


• - b 


is the value of k for which the power with respect to F(A tf , k) becomes a maxi¬ 
mum. We have merely to show that ^"(fc) is negative for k = ky ■ 

Using the fact that ^(fe w ) = \p'(k N ) = 0 we obtain 


kf / ksr ("\/2(fcjv — l)) 3 

Substituting for A N the right hand side of (12) we obtain on account of (10) 
~56 (N - 1) , 64(2V - 1) L 8(N- 1) /4 4^ 

’* (W - K - + + 2(F^T> V5 “ S/' 

Using 2 (k — 1) > k we obtain 

*"(**) < ~ (-24(AT - 1) + ^ {N - 1)) 

which is negative. <r' can be shown to be of order kh ; <p"(k N ) is, therefore, of 

The maximum is, therefore, rather flat for large 


order 


$- - °(w) 


values of N. 

We shall now show that if k is large enough to assume x' 2 to be normally 
distributed then F( A, k) is the alternative which gives the smallest power com¬ 
pared with all alternatives in the class C(A) provided the power for the alter¬ 
native F(A, k) equals 

We know that E (t*. is smallest for F(A, k) Since the power with respect 
to F{ A, k ) equals ^ we have 

E (2 - (k - 1 - 1) = 0 

Thus the lower limit of the integral in (9) becomes negative for every other 
alternative and the power will be larger than 

The power with respect to F( A w , ky) is equal to k, hence if we choose k — k N 
the power of the test will be ^ § for all alternatives in the class C(Ay) On the 
other hand if we choose k ky then there will be at least one alternative in 


* Cantelli’s formula and its proof are given by Frdchet in his book Becherches Theonques 
Modernea aur la Theorie de ProbabiltUs, Pans (1937), pp. 123-126 



31G 


H. H. MANN AND A. WAI,D 


C(A n ) for which the power is <L (For instance P(Ay , k ) is such an alter¬ 
native.) 

The above statements have been derived under the assumption that x' 2 is 
normally distributed Hence if the distribution of x' ! were exactly nermal 

kjf = 4 would be a best k and foi this k N and Ay = --4- the 

V c ' kt, k% 

greatest lower bound of the power in the class C'(Ay) would be, exactly Since 
the distribution of x' 2 approaches the normal distribution with k —* oo the 
sequence j&y) is best in the limit and Theorem 1 stated in the introduction 
is proved. 

For the purposes of practical applications, it is not enough to know that 
{&y} is best in the limit. We have to know for what values of A r fey can be 
considered practically as a best k, i.e, for what values of N the quantity e(fey , N ) 
defined in the introduction is sufficiently small. The quantity e(fcy , N ) is cer¬ 
tainly small if for the number of class intervals fey the distribution of x' 2 is near 
to normal and if the. power with respect to at least one alternative of the class 
C(Ay) is smaller than % also in the case when the number of class intervals is too 
small to assume, a normal distribution for x ,J . 

We shall in the following assume that for k > 13 the normal distribution is a 
sufficiently good approximation. Actually we need not assume a normal distri¬ 
bution but only that the probability is close to £ that the statistic will exceed 
its mean value. 

Cantelli 3 gave the following formula, Let M r be the rth moment of a distri¬ 
bution about *o . Let d be any arbitrary positive number. Let P(\x — x 0 1 ^ d) 
be the probability that | x — x 0 1 < d then the following inequalities hold: 


If 


If 


Mr < M ir 
d T x d lr 


Mr > 


Mir 

* d ir 


then 


then 


Mr 


P(|*-*o| ^ d) > 1 - ^ 

PQx - d) Z 1 - - 

{d — M r ) + Mir — Mr 


Since x' 2 can only take positive values we have 

(14) If ) < Z x 'l.+SfWDT, then P( x ' 2 ^ e k ) > 1 - . 

Ck Ci. Ck 


cl 


if E u") s & + t£(x ,2 )r 


(15) 


Ck 


cl 


then P( x ' 2 5$ c k ) > 1 - 


4- 


(Ck - Mx'W+crl"' 
Where c* is determined so that P(x 2 ^ Ck) equals the size of the critical region 
if the null hypothesis is true and the number of class intervale equals k. c* can 
be obtained from a table of the chi-Bquare distribution. 

For P(Ay , k) we obtain with Ay = — — i from (3) and (4) 

/C.v fCff fc 




CLASS INTERVALS IN CHI SQUARE TEST 


317 


E(/) - (* - 1) + 4 (N - 1)4?, 

<V* = 2(t - 1) + 8Ay(^ + 2AT - 4) - 32(2AT - 1)A?. 

By numerically calculating E{x') and «r x *i for JV - 450 and a 5% level of sig¬ 
nificance, for N = 300 and a 1% level of significance, and for k = 13,12 ■ ■ * 

*f 1 it can be shown that for these values of N and k 


1 

LVI 


(16) 


7 


Ck 


A 


Hence we have to use (15), From (16) it follows that c L > E(x' 2 ). If 
P{x' 2 < h < \ we obtain on account of (15) and (16) 


<V 2 




1 




<V J + F(x' 2 ) ^ Cfc. 


Numerical calculation shows that for the values of N and k and the significance 
levels considered 

(17) oys + Eix' 2 ) < h . 

It can then be shown that for N } 450 and N } 300 respectively iVA,(- decreases 
with N. A simple argument then shows that (16) and (17) are also true for 
all values N } 450 and N ) 300 respectively. Hence the power with respect 
to F(A*, k) is <\for these values of N. Thus we sec For A 450 if the 5% 
level is Used, and for N } 300 if the 1% level is used, the value = 


4; 


h{N - 1) 2 


can be considered for practical purposes as a best k The value 


r , 

1 r* 

is determined so that dt is equal to the size of the critical region. 

V2t J c 



LIMITED TYPE OF PRIMARY PROBABILITY DISTRIBUTION 
APPLIED TO ANNUAL MAXIMUM FLOOD FLOWS 

By Bradford F. Kimball 
Port Washington, N. Y, 

1. Theoretical statement of problem. There is no doubt that Gumbel’s 
recent paper “The Return Period of Flood Flows’’ [1] lias supplied an admirably 
simple technique for engineers to use in approximating the trend of return periods 
of annual maximum flood flows for purposes of extrapolation. This treatment 
is scientifically of great interest because it introduces for tile first time into a 
subject already treated at considerable length by engineers, the theory of the 
probability distribution of maximum values as developed by Fisher and Tip¬ 
pett, von Mises, and otheis. 1 However, certain further observations should be 
made concerning the approach used by Gumhcl. 

Let x represent the measure of daily stream flow having a probability distii- 
bution w(x) . Let the probability distribution of the associated annual maximum 
stream flows be denoted by V(x) with 

(1) W(x) = f V(s) ds, 

Jq 

denoting probability that annual maxima be less than or equal to The 
return period T{x) of an annual maximum flow of measure % is then defined by 

(2) T(x) = j~zrw(x) ‘ 

In this paper the probability distribution w(x) will be called the primary 
probability distribution associated with the probability distribution of maximum 
values 7(i) and its cumulative distribution ]F(x). 

Gumbel argues that ior the type of primary probability distribution that 
might reasonably be expected to apply, W{x) will be of the type introduced by 
R. A. Fisher 1 

(3) W(x) = exp [ — exp — a(x - u)]. 

It is further implied that a primary probability distribution involving an upper 
limit would lead to a probability distribution of maximum values of the typo 



for which moments of order k or higher do not exist. The inference is then 
drawn that a primary probability distribution leading to such a cumulative 
distribution of maximum values would seem to be less likely to be the correct 

1 See references at end of Gumbel’s paper, lee, cit. 

318 



PRIMARY PROBABILITY DISTRIBUTION 


319 


one than one leading to the distribution (3). To this argument we do not 
object; but we question the implied conclusion that hence the use of a limited 
type of primary distribution is to be disallowed. 

If the primary probability distribution be of the limited Galton type 

(5) w(x) = K exp (-§u 2 ), 
where K is a constant and 

(6) u = k[b — log (a — x)], 0 g x ^ o, 

it can be shown that the limiting form of the cumulative distribution of maxima 
of n values takes the same type form (3) where x is replaced by u. This can be 
seen by observing that the transformed variate u becomes infinite as x approaches 
a, and hence has infinite range to the right, which places (5) in the category of 
distributions which are known to lead to cumulative distribution of maximum 
values of form (3). More explicitly, considering w(x) as a finite distribution in 
x, if one traces the reasoning as set forth in von Mises’ derivation [2] of the limit¬ 
ing distribution (3), one finds that since the cumulative primary probability 

/ w(s) ds does not have a non-vanishing derivative of finite order at x = a, 
Jo 

that what von Mises terms the case of a limited distribution does not apply, while 
the argument for a cumulative distribution of maxima of form (3) does carry 
through, in spite of the fact that x has limited range to the right. This fact 
was not mentioned by Gumbel 

One is thus led to the conclusion that there is no logical exclusion of the 
assumption of a primary probability distribution of the foim (5). 

One might well argue for a first approximation of the actual primary proba¬ 
bility distribution of stream flows—using any regular time interval such as a 
day or an hour—of the form (5). Differentiating u with respect to x, one 
obtains 

(7) h dx = (a — x) du, 

which means that to a constant probability increment A u there corresponds a 
maximum increment A x in measure of stream flow equal to (a/k)Au when x 
is at the lower limit zero. This corresponding increment in stream flow decreases 
linearly to zero as x approaches its upper hound a, imposed because of the 
existence of a finite watershed. 

2. Technique of fitting probability distribution of maximum values in case 
primary probability distribution is of the limited type (5)-(6). Write the cumu¬ 
lative maximum distribution (3) in the form ■ 

W(x) = exp (—exp — y), y = a(u(x) — uf), 

u{x) = k[b — log (o — x)], 0 ^ x ^ o. 


(8) 



320 


BRADFORD F. KIMBALL 


Now it is known that for the distribution 


( 9 ) 


dW = e~ 


dy, 


the mean value and standard deviation of y are given by 
y = .577215 (Euler’s constant C) 
t s /6. 


( 10 ) 


Ay) 


Hence 

# — a[iZ(a;) — Wi] = afcr(h — Ui/h) — Z] = C 

where L denotes the mean value of log (a ~ x), with a; representing the observed 
maximum flood flows. Also 

<r(y) = ak a (L) — t/a/6 


where <r(L) denotes the standard deviation of log (a — x). Hence 
(11) ak — (ir/'v/6)/« r (jb)i b — Ui/k ** C/ak 4- X, 


and y is determined as a function of x by the relation 
(12) y = ak[(b — Ui/k ) — log (a — a:)]. 

It is interesting to observe that it has not been necessary to determine the 
constants k and b of the primary probability distribution, Only the upper 
bound a and observed flood flows are used in this process. From the relation 
(12) the theoretical curve in terms of a; may easily be computed from tables 
relating y to W (See Gumbel, loc. cit., Table II, page 173). 

The difficulty of determining what the upper bound a should be in a specific 
case is a practical one and does not concern the objective theoretical problem 
of choosing the type of curve which most nearly describes the behavior of annual 
maximum flood flows. The point to be made in this paper is that the use of 
what seems to be a reasonable value of a, will materially alter forecasts of future 
annual flood flows relative to forecasts made on the assumption that such an 
upper limit may be neglected. It is also ventured that the resulting theoretical 
probability distribution of maxima will in general give a better ■fit to the series 
of observed floods than one based on the latter premise. Techniques for de¬ 
termination of upper bound a will not be discussed in this paper. 


3, Examples. In order to demonstrate the point in question the two methods 
have been applied to a 57 year record of the annual flood flows of the Tennessee 
River at Chattanooga for the years 1875 to 1931. 2 


1 The author has already used this series in a previous article [3] and for this reason has 
found it convenient to use it here. 



PRIMARY PROBABILITY DISTRIBUTION 


321 


TABLE I 

Series of observed annual flood flows 
(Tennessee River at Chattanooga, 1875-1931) 


(1) 

Observed Flood 

X 

(2) 

Ratio to 
Mean 

(3) 

Per cent of 
Time 

(4) 

Return Period, 
T(x) 

85.9 

.412 

0.88 

1.007 

108 

.518 

2.63 

1.027 

123 

.590 

4.39 

1.043 

310 

1.487 

95.61 

22.8 

349 

1.674 

97.37 

38.0 

361 

1.731 

99.12 

114. 


In Table I, col. (1) is shown the incomplete series of observed annual floods in 
units of 1,000 c.f.s. arranged in order of magnitude. The complete series may 
be referred to in Water-Supply Paper 771 entitled “Floods in the United States," 
XJ. S. Geological Survey, 1936, p. 401 The mean annual maximum flood of this 
series is 208.56. The ratio of each annual maximum to the mean is shown in 
Col. (2). In Col. (4) is shown the observed return period which is taken here 
as the harmonic mean between what has been called the exceedance interval and 
the recurrence interval (see Gumbel, loc. cit., Table I, p. 167). Thinking of the 
57 year record as a span of 57 years, the above procedure is equivalent to taking 
the observed probability W{x) that a given annual flood will not be exceeded 
as the mid-point of the part of this time-span covered by the observed flood in 
question. Thus the lowest flood-peak 85,900 c.f s. corresponds to the span 
from zero to 1.754 per cent of the whole time-span, and hence W{x) is taken at 
the mid-point, —0.877 per cent. Similarly the greatest flood, 361,000 c.f.s. 
corresponds to interval from 98.246 to 100 per cent and is taken at 99.12 per 
cent. These arithmetic means correspond to harmonic means of the “recur¬ 
rence” and “exceedance” intervals referred to above. This is the procedure 
which Hazen [4] originally followed. 

Data from Cols. (1) and (4) of this table determined position of dots on Fig. 1. 
Data from Cols (2) and (3) gave the points indicated by dots on Fig. 2, with 
1 — W{x) recorded on the chart rather than W{x). 

The two theoretical distributions fitted to these annual flood maxima will be 
referred to as distributions A and B. 

Distribution A . In this case the limited type of primary probability distri¬ 
bution ( 5 ) — (6) is assumed From previous studies of this data series made by 
the author [3], an Upper limit of annual floods of some 609,000 c f s. was found 
to be reasonable, and for purposes of this example the same upper limit will be 
assumed for the primary probability distribution. Thus the transformation 
(6) becomes: 


u = k[b — log (609 — i)], 



322 


BRADFORD F. KIMBALL 



Fiq. 1, Comparison of methods of fitting annual flood peaks, (Tennessee River at 
Chattanooga, 1876-1031)—return periods plotted against annual flood discharges on 
semi-logarithmic chart. 

where the logarithm to base 10 can be used without loss of generality since the 
constant k will absorb the conversion factor, The mean value of the logarithm, 
and its standard deviation come to 

L = 2 59772, ff (L) = .06576 
The constants of the transformation (12) are thus determined by 

ak = (rr/v / 6)/(.06576), b - u^k = C/(ak) + 2.59772 




PRIMARY PROBABILITY DISTRIBUTION 


323 


Thus 

l/(«fc) = .05127, b - Ui/k = 2.6273 
and solving (12) for log (609 — x), 

(13) log (609 - x) = 2.6273 - (.05127) y 

Using a table for the known relations between y t JV(x), and T(x) for the Fisher- 
Tippett distribution of maximum values similar to Table II of Gumbel’s article 



Fra. 2. Comparison of methods of fitting annual flood peaks, (Tennessee River at 
Chattanooga, 1875-1931)—Data plotted on logarithmic probability chart designed by 
Hazen, Whipple and Fuller 

(loc. cit.) the corresponding values x of the annual floods are easily determined. 
Thus a theoretical relation between x and W{x) is set up. This is indicated as 
Curve A on the two charts exhibited here. 

Distribution B. The primary probability distribution in this case is taken 
as unlimited to the right, and m general is assumed to have the character of an 
exponentially decreasing function of the measure of stream flow x (see Gumbel, 
loc. cit.). The parameter y of the distribution of annual maxima is given 
directly by 


y = a(x — xi) 










324 


BRADFORD F. KIMBALL 


and 

1/a = (\/6 /f) (stand, dev. of annual floods) = (.77970) (58.26) = 45.425 
aii = (mean annual flood) — C/a = 208.6 — (.57722) (45.425) = 182.4 
Hence 

(14) x = 182.4 - (45.425) y 

and using the table of corresponding values of y, W(x) and T(x) for the Fisher- 
Tippett distribution referred to above, a theoretical relation between x and 
IF(a;) is easily set up. This is plotted as Curve B on the accompanying charts. 

4. Discussion of examples. In Fig. 1 it is to be noted that if theoretical 
curves are continued to the right to give readings for a return period of 1,000 
years, the divergence of Curve A from Curve B is large enough to be of sig¬ 
nificance, numerically. Visual inspection does not indicate which curve is the 
better fit to the observation points. 

In Fig. 2 the curves are plotted on "logarithmic probability” graph paper. 
This paper was designed by Hazen and Fuller [4] specifically for the purpose of 
plotting annual maxima of stream-flows. A significant divergence in trend is 
to be noted at the right hand end. 

These charts indicate that the use of an upper limit may materially affect 
extrapolation of fitted theoretical curves, for purposes of estimating floods with 
a return period, say of 1,000 years. 

If the trends of observed floods in Gumbel’s recent paper in the Transactions 
of the American Geophysical Union [5] are examined, it will be observed that 
in the case of the Connecticut, Mississippi and Rhone rivers, there is a decided 
tendency for the curve of observed floods to turn downwards, away from the 
theoretical curves, which correspond to Curve B exhibited in Figure 1. In 
the case of the Tennessee, Cumberland and Columbia rivers the tendency is 
not decisive, while in the case of the Rhine river at Basel (Switzerland) the 
tendency of the observed curve is upwards rather than downwards. As the 
writer has observed elsewhere [6], this last data series seems to be rather unique 
in character and is possibly the result of a watershed greatly influenced by 
all year around snow deposits. Possibly a radically different primary prob¬ 
ability distribution should be used in this case. 

6. Conclusion. The writer has demonstrated in this paper that in fitting a 
theoretical probability distribution of maximum values to annual maxima of 
stream flows, the use of an upper bound for measures of stream flow by assump¬ 
tion of a primary probability distribution of the type (5)-(6) 

(1) is not inconsistent with the use of the Fisher-Tippett distribution of 
maxima, 

(2) has a reasonable logical basis from the point of view of the hydrologist, 



PRIMARY PROBABILITY DISTRIBUTION 


325 


(3) may materially affect the estimation of return periods when extrapolation 
is involved, relative to results obtained when no upper bound is assumed. 

It has not been within the scope of this paper to discuss techniques for de¬ 
termining such an upper bound, nor to apply the theory to enough data series 
to draw conclusions concerning goodness of fit. 

REFERENCES 

[1] E. J. Gumbel, “The return period of flood flows,” Annals of Math, Stal., Vol. 12 

(1941), pp. 163-190 

[2] Richard von Mises, “La distribution de la plus grande de n valeurs,” Revue Malhe- 

matique de VTJnion Interbalkamque, Vol 1, Athens (1936) 

[3] Bradford F. Kimball, "Probability-distribution-curve for flood-control studies,” 

Trans Am Geographical Union, 1938, pp, 466-475 

[4] Allen Hazen, Flood-flows, John Wiley & Sons, New York, 1930. 

[5] E J. Gumbel, “Probability-interpretation of the observed return-periods of floods,” 

Trans Am. Geophysical Union, 1941, pp 836-850. 

[6] Bradford F Kimball, Discussion of paper entitled “Statistical control curves for 

flood discharges” by E. J Gumbel, Trans Am. Geophysical Union , 1942 



LINEAR RESTRICTIONS ON CHI-SQUARE 


By Franklin E. Satterthwaite 
University of Iowa 


Chi-square is a statistic widely used in statistical analysis, 
the form, 


Yn 5 

2 -.I Xv 


(I) 



It is usually of 


where the aq’s are independent normally distributed variables drawn from popu¬ 
lations with respective means and standard deviations, m, and at . In practical 
problems the independence of the xfs is often modified by placing restrictions 
on the x,’s in order to estimate the m,’s or o-.'s. It is well known that if m such 
restrictions which are linear and homogeneous (also algebraically independent) 
are placed on the Xt’s, then the resulting chi-square, (1), is distributed according 
to the chi-square distribution with n — m degrees of freedom. The purpose of 
this paper is to study the case where the restrictions are not necessarily 
homogeneous. 


1. Geometrical development. The xi’s of equation (l) may be considered 
as co-ordinates in an n-dimensional space. Equation (1) represents a sphere in 
such a space with its center at the origin and with radius, x. We should like 
to determine the distribution of x*- First, since the xfs are independent, we 
may form their joint distribution, 1 

F(xi , Xi . * • ’ Xn) dV => KU]e~ ix ' dxi 

( 2 ) = Ke~ w ,d X xd X f-d x „ 

= Ke~ ix ' dV. 

We may change the variable in (2) to x S if we can determine dV. Since the 
n-dimensional sphere represented by equation (1) has a volume proportional 
to x", we may write 

dV = Kd ( x , )* n 

= K(xT~ l d X \ 

Substituting this value in the distribution (2) we obtain for the distribution of 
chi-square, 

FfxVx 1 - KUT-'e-'*'d x \ 

which is the usual form of the chi-square distribution for n degrees of freedom. 

1 The letter K will be used throughout as a constant, not necessarily the same constant 
from equation to equation. 


326 



LINEAR RESTRICTIONS ON CHI-SQUARE 


327 


We shall next restrict the values of x* by means of a condition, 

( 3 ) auxi + O12X2 + ■ ■ • + fflinX* = pi, Sa 2 ,- = 1 , 

where pi is a constant. This restriction represents a hyper-plane in our n-dimen- 
sional space at a distance pi from the origin. The intersection of this hyper¬ 
plane with our sphere (1) is an (n — l)-dimensional sphere with radius 

*' = (x 2 - P l)K 
The differential of the volume of this sphere is 

dV = K (x 2 - pi)*”-"' 1 d x \ 

Substituting this in the distribution (2) we obtain the distribution of chi-square 
subject to the single linear restriction, (3). Thus 

Fix 2 ) dx 2 = K{x 2 - p*)»<—»-V** d x \ 

or more conveniently, 

^(x 2 - P?) dix 2 - pi) = K{x 2 - d(x , _ pl) 

The argument may be readily extended to include additional linear restric¬ 
tions of the form, 

a 2lXl + a «X2 + • ■ • + «2nXn = Pi , 2^ j = 1, 

(4) 

UmlXl + 0*2X2 + ■ • * + = Pm , 2a^, = 1. 

For convenience we shall assume that the restrictions form an orthogonal set 2 
so that 

SjBijBjt, — 0, X 5^ fc. 

The hyper-plane represented by equation (4) is at a distance, p», from the origin. 
Since (4) is orthogonal to (3), it is also at a distance, P 2 , from the center of the 
(n — l)-dimensional sphere obtained on applying the first restriction. There¬ 
fore the intersection of this hyper-plane with the (n — l)-dimensional sphere 
will give an (n — 2)-dimensional sphere of radius 

x = (x — Pi — Pi) ■ 

Similarly, if we consider all m restrictions, we obtain an (n — ra)-dimensional 
sphere with radius 

x“ - (x ! - Sp5)‘. 


* Any set of linear restrictions which are algebraically independent and consistent may 
be replaced by an orthogonal set Thus if (4) were not orthogonal to (3), we could replace 
(4) by (4) — k( 3) where k is determined by the condition 

2oi;(a^ — ka u ) = 0 

2ai,a2, = kZalf 


or 



328 


FRANKLIN E. SATTERTH WAITE 


The differential of the volume of this sphere will be 

dV = K( x 2 - dtf - V/). 

Substituting this in (2) we see that 

w-'y-x-'ZA 

is distributed as is chi-square with n — m degrees of freedom. 


2. Alternate analytic development. It is perhaps desirable that we present 
an analytic proof of the foregoing theorem. Therefore we shall first regard the 
p/s as variables and shall determine the joint distribution of * 2 and the p/s. 
We may then pass to the distribution of those values of x* which correspond to 
assigned values of the p/s. Note that the x/s are considered to be statistically 
independent. 

The characteristic function of the joint distribution of x and the p/s is 
known to be 3 

e —0/2(1—2»l) 

where 

Q = (hkdikUti 

».»•* 

= 2 * < , since £ a ik a jk = 5,y. 

Applying the Fourier transform, we obtain the joint distribution of x and 
the p / s: 

f(x’, n , • • ■ P.) - K/ • • ■ / it,--- it,it, 

where 


Q 1 = - itx - S itjpf - (2{’/2(l - 2t<)) 


„ 2 2[i,- + ipy(l - 2ii)] z 

X 2(1 - 2it) 


- 4(1 - 2t'()2p;. 


Performing the integration with respect to <i we have, 

F » j _£■(1 ~ 2 

and finally, 


F = JC( X * - 2 P * i )' ln ~ n) - i e~'*'. 


* See A. T. Craig, “A certain mean value problem in statistics,” Bull. Amer. Math. Soe,, 
Vol. 42 (1936), p. 671. 



LINEAR RESTRICTIONS ON CHI-SQUARE 


329 


In our problem we want the distribution of x 2 (or more conveniently, of x 2 ~ 2p,) 
when the p/s take on fixed values. To obtain this we substitute fixed values, 
p/s, into the joint distribution and divide by the marginal total, 

j F( x \ pi, fa ■ • • p m ) dx = iOTHn ~ m)]2 i<n-m) 

This gives us the distribution function, 

Fix 2 - Sp) = 2 - r[ ^_ m)3 lh(x - 

which is a chi-square distribution with n — m degrees of freedom. 

3. Application. As an example of the use of linear restrictions on chi-square 
we shall now examine the effect on the chi-square test of goodness of fit if the 
moments of a sample are not corrected for grouping errors in fitting a frequency 
curve. 

The parameters of the fitted frequency distribution, f{x), are determined from 
the equations, 

( 5 ) N J x k fix) dx = 2 x k 6 ,, k = 0 , 1 , 2 , • • , 

where xj is the mid-point of the f h group and 6j the corresponding observed 
frequency. Next a set of expected frequencies, 

r°i+i 

8, = / Nf(x) dx, a, = (x,_i + x,)/2 , 

Ja, 

is determined by taking partial areas of the fitted frequency distribution. The 
expected frequency is used to transform the actual frequency into a statistic 
with mean zero and unit variance by the equation, 

Xl = ( 8 , ~ $,)/$* • 

Equations (5) may now be rearranged into the form of linear restrictions on the 
Xj. Thus 

(6) 2 xtyxj = pl 
where the pl have the values, 

pl = E - 2 x)bi 

= N J x k i{x) dx - 2 x k J, 


^ 0 in general 



330 


FRANKLIN E. RATTERTHWAITE 


To make our example more specific, let us fit a normal distribution to a sample 
of 1000 items with mean zero and unit variance. Let the grouping be about 
the midpoints, 

Xj ' — 3, —2, —1, 0, 1, 2, 3. 

The expected frequencies in each group are 

§, . (i, 01, 242, 382, 242, Gl, 6. 

The variance of these expected frequencies is 1.080 as contrasted with 1.000 
for the sample. The linear restrictions, (6), now take the forms, 

(7) 2.4x_j + 7.8x- s + 15.6 x _i + 19.5xo + 15.6 X t + 7.8 X *+ 2.4 X3 = 0 

(8) — 7.2 x ~s — 15.6x_j ~ 15,6x_i + 0 + 15.6 X i + l5.G X a + 7.2 X3 « 0 

(9) 2l.6x-s + 31.2 x _i -f 15.6 X -i + 0 + 15.6xi + 31.2 Xs + 21.6x» = -80. 

Because of the symmetry of the normal distribution, restriction (8) is orthogonal 
to (7) and (9). Therefore the only orthogonalization necessary is to replace 

(9) by an equivalent restriction which is orthogonal to (7). This can be done 
by subtracting 1.080 times (7) from (9) which gives 

(10) 19.0 x _3 + 22.8 x- 2 - 1.2x_i - 21.1*, - 1.2 X i + 22.8x5 + IQ.Oxi - -80 

If these restrictions are each divided by the square root of the sum of the squares 
of the coefficients of the xj , they will be the normal orthogonal set required 
by the development. The distances of these restrictive planes from the center 
of x’-sphere are 


Pa) — 0> Pm = 0, p(a, *= 1,7. 

Thus if we test the goodness of fit of the normal distribution to this sample by 
calculating chi-square, 


2 _ v 2 
x = £ Xj 


(e, - W 


6 , 


we should subtract from x a correction of 


H pi — 2.8 

before judging the significance. This correction adjusts for the effect of the 
grouping error on the chi-square test. 

In this example, chi-square has four degrees of freedom so that an error of 
2.8 is large enough to affect our judgment of its significance. It can be shown 
that the correction is proportional to the size of the sample. Therefore, if our 
sample had contained only 100 items, the fit obtained by ignoring grouping 
effects would be almost aa good as the fit when the sample moments were cor¬ 
rected for grouping. On the other hand, if the sample had 10,000 items, it 



LINEAR RESTRICTIONS ON CHI-SQUARE 


331 


would be practically impossible to obtain a satisfactory fit without correcting 
for grouping errors. 


4. Conclusion. The theory of the loss of degrees of freedom for chi-square 
when the underlying statistics are subject to linear restrictions does not require 
the restrictions to be homogeneous. For restrictions which are not homogeneous, 
a correction must be subtracted from chi-square equal to the square of the 
distance from the center of the sphere, 


X = 2x1 = 0 


to the intersection of the restrictive planes. Non-homogeneous restrictions 
sometimes arise in practice because of the bias introduced by an approximation. 
An example is given from curve fitting. 



SYSTEMS OF LINEAR EQUATIONS WITH COEFFICIENTS SUBJECT 

TO ERROR 

By A. T. Lonseth 
Ima State College 

1, Introduction. Various scientific problems lead to non-homogeneous sys¬ 
tems of n linear equations in n unknowns, in which the n s + n coefficients (in¬ 
cluding "absolute” terms) are subject to error. Sucli errors may be errors of 
observation, or errors introduced by rounding off decimal expansions. If the 
system has a non-vanishing determinant, the ordinary rules yield the solution. 
But the question arises: how may the possible errors in the coefficients affect the 
solutions? In particular, one would like to know how to exclude the fatal event 
that some malicious combination of errors might make the determinant zero. 
One would further like to have limitations on the Rolution-errors in terms of 
maximum coefficient-errors. Considering the coefficient-errors as random vari¬ 
ables, one may also inquire as to the probability distributions of the solution- 
errors. 

The principal result obtained in this paper is the Taylor's expansion of the 
error in any unknown, considered as a function of the n(n + 1) errors in the 
coefficients. An upper bound is obtained for each term of this series, and the 
sum of these upper bounds (when convergent) is expressed in closed form. Thus 
are obtained not only approximations to the maximum error, but an actual upper 
limit. Convergence of the power series is established for sufficiently small 
coefficient-errors; "sufficient smallness" is specified in terms of a simple criterion, 
which simultaneously provides a sufficient condition for the non-vanishing of a 
determinant with elements subject to error. 

These results were obtained before I learned that work had already been done 
on the problem. The earliest seems to be that of F. It. Moulton [2] in 1913; he 
found the first order approximation (6) for n = 3, and discussed the geometrical 
reasons for sensitivity. Much later I. M. H. Etherington [1], evidently un¬ 
aware of Moulton's paper, found the expression for the total error of a deter¬ 
minant whose elements may be in error, and applied this to the present problem. 
He thus found limits for the first and second order errorB, m a rather different 
form from mine. The probabilistic considerations of section 5 were suggested 
by Ethermgton's article. L. B, Tuckerman [3] recently discussed the question 
of estimating computational errors incurred in the course of solution. He con¬ 
sidered only errors of first order, 

My original procedure was to compute the terms of the Taylor’s series as 
successive differentials of the unknown, from Cramer’s formula. This soon be¬ 
comes laborious, and I found only the first two terms. The linear matrix equa¬ 
tion (4) was then kindly suggested to me by R. Oldenburger. Here (4) is solved 
by iteration, resulting in a simple recursion formula for successive terms of the 
Taylor’s series. 


332 



SYSTEMS OF LINEAR EQUATIONS 


333 


2. Formal matrix solution. Let the system of equations be 

n 

(1) UtjX, = C, l — 1,2, • ■ •, Tl. 

3“1 

In terms of the matrices 


a n 

■ a ]n 

, x = 

Xi 

, C = 

ci 1 

Onl ' 

^nn 


X„ 


, Cn J 


system (1) can be written 

(2) AX = C. 

Supposing that not all c's vanish, and that A, the determinant of A, does not 
vanish, there is a unique solution X. But the a’s and c’s, and consequently the 
x’s, ate subject to error: let the true value of a t] be a,, + a,/ ; of c f , c< + y,; 
and of the resulting X ], xj + (,. We must actually deal with the system 

(3) (A -f a)(X + x) = C + c, 
where we have written 


a u ■ 

■ * am 

, * = 

V 

, C = 

7l 

. a„i • ■ 

^nn 


, £». 


. 7n 


Expanding (3) and using (2), we find for the error-matrix x 

(4) x = m + nX + nx, 

with m = A -1 c, n = — A -J a; A -1 is the inverse of A. We solve (4) formally 
for x by iteration. Thus 

x = m + nX + n(m + nX) + n 2 x, etc.: 

and there results the infinite expansion 

(5) x = 2 x (i) : x<1> = m -f nX; x (k) = nx (k_1> ', A: > 1. 

In section 4 convergence of (5) will be established for sufficiently small | atj |. 

3. The elements of x <k) . It is necessary to consider closely the individual 
elements of x (W . Writing 



334 


A. T. LONSETH 


we note from (5) that 

f. - t & i 

Jt-1 

this is precisely the Taylor’s series for the error in x, : each is a homogeneous 
polynomial of degree k in the a's and y’s. Writing A u for the cofactor of a is 
in A, 

x w = m + nX = A~‘(c - aX) 

An A„i 

A ~A 


Ain Ann 

~A T, 

Au A„i 

"A A 


A In A nn 

A A 

whence (summing hereafter from 1 to n on Greek-letter subscripts) 

Si = {21 Tii Am — x\. 2^ a? iA n — ■ ■ • — Xn X) a »»A w |. 

From (6), if fc > 1, 

i (t) = nx {k ~ l) = — A“ l az ( * _l) 

1 1 ) ’ 


1 1 >(*-« 

^ 2a(,l Ann ’ — Sa^nA^n " 

so that 

® «!*’ --4 

The sums Xy^A^y, Sa^An/ have obvious interpretations as determinants. 

4 Bounds and convergence of the series. Assuming | a u |, | y { | g j and 
taking absolute values in (6), 

\e\ s r J 1 a + EkIKE |Am|). 






( 8 ) 



SYSTEMS OF LINEAR EQUATIONS 


335 


It will be observed that equality can be attained for a particular choice of a’s 
and y’s as ±5: the bound for first-order errors is best possible. But it is not in 
general possible by a single choice of a’s and y’s to obtain equality for all j. 
Similarly from (7) 

Ui B l a | 4i<£l«r , IK£l^l), *>i, 

whence by induction 

(9) 1 ^ 1 = (|4|y (1 + r I * IKE E I A„„\rxz I A„ I) 

Summing on k, 

i I l!- H I s !^| <1 + £ | *„ |)(£ | A„ I) (± , 

with 

p = jT| ? ? 1A " >• 

If p < 1, we can let m —> °o : 

do) lt,l £ ^(1 + EUJKE 14 w |)/(i - P ). 

Observing that the y’s occur linearly in (6) and (7), we conclude that (5) con¬ 
verges if 

(ID l«.,| ^ < I A 1/(E E 1^*1). 

v M 

It follows that the determinant of the system (3) cannot vanish if (11) holds. 
This is rather remarkable, in that <52Z j A„, | is merely the maximum first-order 
term in the error of that determinant ([1], p. 108); the effect of higher order 
terms (i e., of any but first-order minors) in producing a zero determinant can 
be wholly ignored. 

From the remark after (8), it appears that equality in (9) and (10) cannot 
generally be attained. 

If (10) is written | £,• | 5 B/{ 1 — p), it is easily seen that the remainder after 
the Ath approximation does not exceed p h B/(l — p) 

5. Probability distributions. We now consider some consequences of the 
following assumptions: the a’s and y’s are identical, independent random vari¬ 
ables, bounded by a 8 satisfying (11), and distributed symmetrically about zero. 
(It would be reasonable to assume further that they possess a frequency func¬ 
tion, which is nowhere concave upward ) Writing <§(x) for "expectation of the 
random variable x,” we have 

S(a tJ ) = €(y t ) = 0, §(«!,) — &(y*) = <r 2 < 5 2 . 



330 


A. T. liONBBTH 


On account of independence and symmetry, the expectation of any power- 
product of a’s and -y’s containing an odd power must be zero, To first order, 
the mean a, of the solution-error £, is approximated by 

(12) o}“ = 6(tf u ) - 0; 

and the standard deviation Sj by 

d3) s, m = vmff\ ( id + % *;)(£ aw. 

The second approximation to oy is also easily obtained: 

(14) a, (1) - S(f' s) ) = ~GC £ 

Both (13) and (14) were given by Etherington [1], though in a leas symmetric 
form. Higher approximations, as he remarks, involve complicated summations; 
but if they should ever be required, the machinery exists in (6) and (7) for their 
systematic computation. As to the errors in using (13) for the standard devia¬ 
tion S, and (14) for the mean, we know only that 

a/ = a, (5 ' + o(i 4 ), (S > (1 ) l + o(J 4 ). 

Etherington ([1], p. Ill) considers the important special case of “rounding 
off” decimal expressions. Each a and c is supposed correct in the gth decimal 
place, the (q + l)th figure being "forced," i.e., increased by one when the 
(q + 2)th figure is dropped, if the (q + 2)th is 5, 0,7,8, or 9, Assuming constant 
frequency KT’ in the interval ( — fUT\ J10 - *), we may use (13) and (14) with 

(r 1 = 10'" s, /12. 

Errors of observation are often assumed to be normally distributed. There is 
nothing against such an assumption with regard to the y's, but the a’s must not 
make (3) singular, and must accordingly be suitably bounded, e.g. by (11). 

6. Conclusion. The formulas and bounds of this paper involve only these 
quantities: the determinant A, its first order minors, and the solutions of (1). 
They can be found in the course of solving (1) by orthodox methods. 

Inequality (10) definitely limits the maximum solution-errors, in terms of the 
maximum coefficient-error S, provided S satisfies (11). But it may be that (8), 
either alone or in conjunction with the second-order bound from (9), will give a 
better approximation. 

The ratio 22 | A M , |/( A | may be taken as a “measure of sensitivity” of (1) 
to error, 

The fundamental formulas (6) and (7) are capable of solving other problems 
than thoBc studied here. For example, it may happen that only certain elements 
(such as those of a single column) are in error, in which case better inequalities 
can be found. Or the a’s and y’s may not be independently and identically 
distributed. 



SYSTEMS OF LINEAR EQUATIONS 


337 


REFERENCES 

[1] I. M. H. Ethehijjgton, “On errors in determinants,” Proc Edinburgh Math. Soc , 

Ser 2,Vol 3 (1032), pp. 107-117, 

[2] F. R Moulton, “On the solutions of linear equations having small determinants,” 

Amer. Math. Monthly, Vol 20 (1913), pp 242-249. 

[3] L. B Tuckbrman, “On the mathematically significant figures in the solution of simul¬ 

taneous linear equations,” Annals of Math Stat., Vol 12 (1941), pp 307-316 

[4] P G. Hobl, “The errors involved in evaluating correlation determinants,” Annals of 

Math. Stat,, Vol. 11 (1940), pp. 68-65 



ON MUTUALLY FAVORABLE EVENTS 

By I\ai-Lai Chung 
Tsing Hua University, Kunming, China 

Introduction. For a set of arbitrary events, E. J. Clumbel, M, Frdchet and the 
author 1 have recently obtained inequalities between sums of certain proba¬ 
bility functions. One of the results of the author is the following: 

Let Ei , • ■ • , E n be n arbitrary events and let p m (n , • • ■ , v k ) denote the 
probability of the occurrence of at least m events out of the k events 
E n , • • ■ , E, k . Then, for k = 1, • • , n — 1 and 1 g m £ k we have 



where the summations extend respectively to all combinations of k + 1 and k 
indices out of the n indices 1, • > • , n. 

In course of proof of the above inequalities it appears that similar inequalities 
between products instead of sums can be obtained under certain assumptions 
regarding the nature of interdependence of the events. We shall first study the 
nature of such assumptions, and then proceed to the proof of the said inequalities 
(Theorems 1 and 2). It may be noted that the inductive method used here 
serves equally well for the proof of the inequalities cited above, though some¬ 
what longer, but apparently our former method is not applicable here, 

That events satisfying our assumptions actually exist, is shown by an appli¬ 
cation to the elementary theory of numbers. The author feels incompetent to 
discuss other possible fields of application. 

1. Let a set of events be given 

El , El , • • • , E n , ■ • ’ 

and let E[ denote the event non-E,-. Let p(i) denote the probability of the 
occurrence of E < , p(i') that of the occurrence of E {. For convenience we 
assume that for any i p,( 1 — p,) ye 0; events with the exceptional probabilities 
0 or 1 may evidently be left out of account. 

Let p(i>i ■ • • vk) denote the probability of the occurrence of the conjunction 
E, l E tk and let p{m ■ • • p*, m • • • r*) denote the probability of the occur¬ 
rence of E ri "• E n , on the hypothesis that E^ ■ • * E„ k have occurred. The 
/i'a or p’b may be accented. 

Definition 1: If p(q , y } ) > p(y j), we say that the-occurrence of the, event E n 
is favorable to the occurrence of the event E„ , or simply that E, L is favorable to E, t . 


1 "On the probability of the occurrence of at least m events among rt arbitrary events," 
Annals of Math. Stal. Vol 12 (1941), pp, 328-338. 

338 



MUTUALLY FAVORABLE EVENTS 


339 


If p(v 1 , vf) = pM, we say that E Vl is indifferent to E Vi . If p(vi , vf) < p(v 2 ), 
we say that E n is unfavorable to E y . z . 

Thus the relations "favorableness,” "indifference,” and “unfavorableness” are 
mutually exclusive and together exhaustive. We state the following immediate 
consequences: 

(i) Reflexity: An event is favorable to itself; in fact, p(v, v) = 1 > p(v). 

(ii) Symmetry: If Ex is favorable (indifferent, unfavorable) to E 2 , then E 2 
is favorable (indifferent, unfavorable) to E x . In fact, we have 

p(l)p(l, 2) = p(12) == p(2)p(2, 1), 

p(l. 2) = p(2, 1) 

p(2) p(l) 

Thus p(l, 2) | p(2) is equivalent to p(2, 1) | p(l). 

In particular, if E\ is indifferent to E 2 , then so is E 2 to Ei . They are then 
usually said to be independent of each other. 

(iii) If Ex is favorable (indifferent, unfavorable) to E 2 , then E[ is unfavorable 
(indifferent, favorable) to E 2 . For, we have 

p(l)p(l> 2) + p(l')p(l'> 2) = p(12) + p(l'2) = p(2), 

whence 

p(l')p(T, 2) = p(2) - p(l)p(l, 2). 

On the other hand, 

P(10P(2) = [1 - p(l)]p(2) = p(2) - p(l)p(2). 

Since by assumption p(l')p(2) ^ 0, we have 

p(l', 2) p(2) - p(l)p(l, 2) 

p(2) p(2) - p(l)p(2) 

Thus 

p(l', 2) | p(2) according as p(l, 2) | p(2). 

For the sake of brevity we introduce the following symbolic notation: 

( 1, if Ei is favorable to E 2 
0, if Ex is indifferent to E 2 
— 1, if Ei is unfavorable to E 2 . 

Then by (ii) and (iii) we have 

Ei/Ej = Ei/Ei , 

Ei/Ei = E 2 /E[ = Ex/Ex = E 2 /Ex = — (Ei/E 2 ), 

Ei/Ei = Ei/Ei = Ei/Ei , 

analogous to the rules of signs in the multiplication of integers. 



340 


KAI-IjAJ CHUNG 


(iv) Non-transitivity: If E, is favorable to E t , and E% is favorable to E, 
it does not necessarily follow that is favorable to E ,; in faet, it may happen 
that Ei is unfavorable to Bn . For instance, imagine 11 identical balls in a bag 
marked respectively with the numbers 

— 11, -10, -3, -2, -1, 2, 4, 6, 11, 13, 16. 

Let a ball be drawn at random. Let 

Ei = (the event of the number on the ball being positive) 

Ei = (the event of the number on the ball being even) 

Es = (the event of the number on the ball being of 1 digit) 

We have 

p( 1. 2) = $ > -j^r ** p(2), 

p(2, 3) = 4- > * p(3), 

P(l, 3) * k < A * 7>(3). 

(v) It may happen that Ei/E, = 1, E,/E, « I, but E X E 2 /B 3 ~ —1. In the 
example above, 

p(2, 1) «« 4 > /r ® p(l), 

p(3', 1) « i > «= p(l), 
p(23', 1) - i < * - p(l). 

(vi) It may happen that E { /Ei - 1, £,/E* «* 1, but E l /BiE i = -1. Ex¬ 
ample: 

p(l, 2) « | *= p(2), 

p(l, 3') “ i > A » p(3'), 
p(l, 23') = i < A - P(230. 

(vii) It may happen that Ei/E, = 1, B,/E, «= 1, but the disjunction 
(E x + Ej)/E a = — 1. For, by (v) we know that there exist events E[, E' 2 , E' } 
such that 

E'l/Ei - 1, E'l/E'i * 1, - -1. 

Hence by (iii) there exist events E x , E,, E, such that 

Ei/E, - 1, Ei/E, - l, (E'lEiY/B, - -1. 

But (EjEj)' = Ei + Ei. Thus the last relation is (E x + E,)/E, «• — 1. 

(viii) It may happen that Ei/E, * 1, E,/E, » 1, but E X /(E% + E,) = -1. 
This follows from (vi) as (vii) follows from (v). 

After all these negative results in (iv)-(viii), we see that we cannot expect to 
go far without making stronger assumptions regarding the nature of inter- 



MUTUALLY FAVORABLE EVENTS 


341 


dependence between the events in the set. Firstly, in view of (iv), we shall 
restrict ourselves to consideration of a set of events in which each event is 
favorable to every other. Secondly, m view of (v), we shall only consider the 
case where the “favorableness,” as defined above, shall be cumulative in its 
effect, that is to say, the more events favorable to a given event have been 
known to occur, the more probable this given event shall be esteemed. We 
formulate these two conditions in mathematical terms, as follows: 

Definition 2: A set of events Ei, ■ ■ , E n , ■ • • is said to be strongly mutually 
favorable (in the first sense) if, for every integer h and every set of distinct indices 
(positive integers ) pi , • - • , p k and v we have 

P(Pi ' ■ ' Ph , v) > p(p 1 ■ • • Ph -1 , v). 

This definition requires that there exist no implication relation between any 
event and any conjunction of events in the set; in particular, that the events 
are all distinct. It would be more convenient to consider the relation “favor¬ 
able or indifferent to.” This will be done later on. The present definitions 
have the advantage of being logically clear cut and also that of yielding unam¬ 
biguous inequalities. 

From Definition 2 we deduce the following consequences: 

(1) If the set (yt , ■ • ■ , p*) is a sub-set of (m , • • • , w ), we have 

P(pi ■ ■■ Ph, v) > p(yt ■ ■ • p* , v). 

(2) For any positive integer h and any two sets (vi , • • • , v k ) and (pi, • • • , ju*) 
where all the indices are distinct, we have 

p(pi • • • Pk , vi - ■ ■ vk) > p(pi ■ ■ ■ Ph-i, n ■ • ■ v k ). 

More generally, we have as in (1), 

p(pi • ‘ ■ ph , n ‘ ■ Vk) > v(p* ■ • • p, , v i • • • v k ). 

Proof: We have only to prove the first inequality. For k = 1 this is the 
assumption in Definition 2. Suppose that the inequality holds for A; — 1, we 
shall prove that it holds for k, too. 

P(PI ■■■ Ph, VI ■ • ■ Vk) _ p(Pl ■ ~ ' Ph-l)p(pi ■ • • Ph)p(pi ■ • ■ ph, V 1 • • • Vk) 

p(p i • • ■ ph- i ,vx ... vf) p(pi • • • Ph)P(P) ■ • ■ Ph-i)p{pi • ■ • Ph-l,vi ■■■ v k ) 

_ p(pi • ■ • ph~i)p(pi • • 1 ph vi ■ • ■ Vk) 

p(pi ■ • • Ph)p(pi • ■ ■ Ph-lVl •• • Vk) 

_ p(pi • ’ • Ph-l)p(pi • ■ • Ph)p(pl • ■ • Ph, Vl)p(pi ■ • • PhVi, VI ■ • • Vk) 

p(pi • • • ph)p(pi • ■ • Ph-i)p(pi ■ ■ • Ph-l, Vl)p(pi • • • Ph-lVl , Vi • ■ ■ Vk) 

_ p(pi • • • Ph l Vl) p(pi ■ • • PhV 1, V2 • ■ ■ Vk) 

p(pi ■ ■ • Ph-l , Vl) p(pi ■ ■ ■ Ph-l Vl , • ■ ■ Vk) 

y(pi ■ • 1 PhV I, Vl ■ • • Vl) > J 
p(pi ■ ■ ■ Ph-l Vl , v 2 ■ ■ • Vk) 



342 


KAI-U.I fHtfNG 


Observe that none of the denominators vanish by our original assumption and 
by Definition 2, 

Therefore we see that when the failure in (v) is remedied by our definition, 
the failure in (vi) is automatically remedied too. 


2. Theorem 1: Let n > 1 and let E t , • • ■ , K n , • • ■ be a sd of strongly mutually 
favorable events (in the first sense). Then tec have, for k ~ n ■— 1 , 


n 


^,)i ( '* ■> ■ > 


II [p(fi * • • n)! 


where the products extend respectively to all combinations of k + 1 and k distinct 
indices out of the. indices 1 , • * • , n. 

Proof. We may assume that the, indices are written so that 

1 £ < ej < • •' < fjm i £ n. 

Taking logarithms, we have 

(all) 2 log v(v\ • * * Hu) > ( U 7 *) Z 'OR P&i ■' ■ n). 

Substituting from the obvious formula 

pin • ■ • s*) *= p(n)p(r, , Vt)p(»iifi, v,) - ‘ • p{v I • * • e*_i, e*), 
and writing log p{ - • ■) « <?(■ ■ *), the inequality becomes 

(jc ~ l) *'») + ■*•+ cfvi • • • n, Si+i)] 

(l) 

> k j S[g(Mi) + q(ri, n) + • ■ * + q(n • • ■ n- 1, n)i 

Immediately we observe that the number of terms of the form 
q(yi • v,, p)(0 g s Si p — 1) with a fixed u after the comma in the bracket 
is the same on both sides, since 


( 2 ) 


(vXvmvxv)- 


Let the sums of such q 'a on the loft and right of (1) be <r u> = <r w (y) 
and <r CJ1 = <r w {u) respectively. To prove our theorem it is sufficient to prove 
that <t !1> (m) J6 <r m (p) for every u and <r a! (a) > a w (u) for at least one g. 

Now the terms in <r U) (or <r w ) fall into classes according to the number 8 of the 
M.’a before the comma in the bracket, Let those terms having s pfs before the 
comma belong to the s-th class. It is evident that the number of terms of 
the s-th class in <r !1! (a) is equal to 


i 


(v)(v)(v) 



MUTUALLY FAVORABLE EVENTS 


343 


for s — 0, 1, • • ■ , — 1; where we make the convention that 

(o) = ’ iffi) = ® if a < b or if b < 0. 

Thus for a fixed p, when the terms in u' l] (p) are classified in the above manner, 
its total number of terms may be written as the following Bum, in which vanishing 
terms may occur: 

(n — l\ln— l\ (n — l\(/p~ l\ / n-p \ 

\k- l)\ k ) \k- l)\k-p + l) 

+ ■■•+(' oOC »“)}■ 

Similarly the total number of terms in <r C2> (/i) may be written as the 
following sum: 

( n * 0 G=9 - (• 10 {(;: l) (i :;) + 1- 04 -;; 0 

+ ■ • + (‘‘7 1 )G-7-1) + ••■ + ('o Oft -1)}- 

Lemma 1: For 0 g s g k, we have, taking account of our conventions about the 
binomial coefficients, 

® (j: i)(fc:i)>(”;%777 i) /«• 

« z 0Ci=•)*C* * 00.-7- 0 » 

Proof- Suppose s ^ i: — n + then 

/n - l\ fn - A /n - l\ / n - p \ 

Vfc - iA* - v < V fc A* - s - v 

according as 

k ;> _ k — s _ 

n — k < n — p — fc + s + 1’ 

i.e. according as 

a | 0* - l)*/». 

But, since k < n and p ^ n, we have 

n — k — fc/n d - 1 (n k)p/n 

(p — !)&/«■ > k — n + p — 1 



344 


KAI-I.AI CHUNG 


so that 


(p — 1 )k/n + 1 §: k — n + p. 

Therefore if s > (» - 1 )k/n, then s ^ (p - 1 )k/n + 1 £ k - n + » and (3) 
holds. ' 

Again, if k — n + p g s g — l)A'/a, then (4) holds; while if « < k — n + p t 
then the left-hand side of (4) vanishes while the right-hand side is non-negative' 
thus (4) holds for s g (p — 1 )k/n. The lemma is proved. 

If we put (s = 0, 1, >. * , it) 

then by Lemma 1, 

d. ^ 0 according as s <= (p, - \)k/n. 

This means that although the total number of terms of the form ■ ■ ■ p, , p) 
is the same on both sides of (1), the. left-hand side is more abundant in terms 
with larger s while the right-hand side is more abundant in terms with smaller s. 
Now we have 


■■ ■ p< , p) > q(p* • • • p* , ju) 

if i>j and if (p, ■ ■ ■ p*) is a subset of On • • ■ pt). Hence it is natural to suppose 
that the left-hand side must be greater because it is more abundant in terms of 
larger values. Unfortunately even if i > j, the last inequality is in general not 
true if the set (p\ > • • pj ) is not a sub-set of (pi • • • pt ). Therefore we cannot 
as yet conclude that cr (1) ^ <r m . 

lo prove that actually we have <r (11 ^ <r t2) , we make the following “process of 
compensation”; 

We have, by (2) and the definition of d, , the following equality: 



o 1 

^ I-*, 

S- 

+ 

>-* ! 

i i 

+ 

+ 

r-t 

where dj = 

0 if j > k. Thus 



d, £ 0 

for s k(p — 1 )/n, 

Hence 

d. S 0 

for a > k(p - I )/». 

(5) 

( M o 1 )* + ( M T 

x )*+••■+(;: 


for l = 0, 1, • • •, p - 1. 



mutually favorable events 


345 


For the fixed n, let 


„U) 

pi 


( 2 ) 

PI 


G - 0{( * + G - i),s ?u,/i)+ *" 

= ( k ){G-i) 2 "‘ + G-ss)^ 0 * 1 ’' 0 + 

+ G-7-i) ^ 

V* * V f»t<* • <m<fi 

so that 

= *%), p'1\ = „%). 

For p = 1, Z = 0, we have 

<r U> (l) = p ^ 11 = pf> = cr t 2 >Cl). 

Lemma 2; For p > 1 and 0 ^ Z < n - 1, we have 


pi 


MI 


, M) 


, m) 


E gU -Pi.M) < - z t 1 , E 


g(pi • ’ • Ml+l, m) ■ 

i 

Proof: We have, for any v < m, v j* m> (t = 1, ,1) 

g(pi • • ■ mv, fi) > q{in • - • mi , m). 

Summing with respect to all such Fs, 

E gOu • ■ • Miv, m) > (m — l — 1 )g(/ii ■ ■ ■ mi , v). 


Summing with respect to all 1 £ mi< • ■ ■ <m < n, 

E E g(w. • • ■ mi m ) = (z + i) 


l£tu< , "< /»!<»■ * 


g(Ml 


/*<+! 


,M) 


> (/i — Z — 1) E Q(mi ••• mi, m)- 

l£Pl< 

The lemma is proved. 

Now we use induction to prove that for p > 1 and Z = 1, 1 


(i) (J) .. 

pi ~ Pi > 


do + ( M i *)& + j 

CrO 


x E gOn ••• mi , m ) so. 

i£^i< • </*i <e 

This inequality holds for Z = 1 by Lemma 2. Assume that it holds for 
l, (Z < m — 1). Then we have, by (5) and the fact that each g < 0, 



346 


KAI-LAI Hit’NCi 


(1) (S) _ ID IS) , J V \ 

PHI — PM! — Pi ~~ Pt -r Oj-H L, t ■ • • Pl+l , pj 

15 m<" <hh<ii 


do 4" 


p - 1 
1 


CT 1 ) 


ei 1 ) 


(u 


2 ■ ■ ■ pi I p) 

+ rf/u 2 <?0n • • * pi+i, p) 


d« + (^ j ^ri, + j 


di 


H i + d '* 1 ) -2 ^ 1 " Mi+i, p) 


(V) 

*+(% 1 V' + ■ • ■ + ti') * + ( j + 0 * 

'W 


ZffO.1- ■ Pi+1> p) § 0, 


Therefore, for p > 1, we have, 

a (pj — a (m) = p^-i — p„~i > 0. 
Since n > 1 and 1 ^ S ft, there exists a p > 1. Hence 

t *%) > ± ^ <5, (p) 

>i**l M—»1 

which is equivalent to the inequality (1). 


3. Our next step will be to obtain a generalization of Theorem 1. Consider 
a derived event defined by a disjunction of a (finite) number of events in the 
set, as follows. 

E n +#„ + ••• + E, m . 

We call such a disjunction a disjunction of the m-th order. 

Definition 3: A set of events is said to be strongly mutually favorable in the 
second seme if for every positive integer m, the derived set of events consisting of 
all the disjunctions of the m-th order forms a strongly mutually favorabh set of 
events (in the first sense). 

Let D = D(m) denote in general a disjunction of the m-th order; let 
p(Di ■ • • Dh , D) denote the probability of the occurrence of the disjunction 
D, on the hypothesis that the conjunction of the h disjunctions £>, • • • Dh has 
occurred. Then Definition 3 says that for any positive integer h and any set 
of distinct D’s we have 

p(A • • - D h , D) > p(p t • • • , D). 

Since a disjunction of the 1st order is an event E, we see that Definition 3 
includes Definition 2. 



MUTUALLY FAVORABLE EVENTS 


347 


Let D m (vi , • • , vk), v\ < ■■< Vi denote the derived event 

n (e ui +. ■ ■ + ej 

Ml> iMtn 

where the product (conjunction) extends to all combinations of m indices 
out of the indices v \, • ■ ■ , Let pt(v\ , • • ■ , vf) denote the probability of 
the occurrence of D m (vi , ■ ■ , v k ). It is seen that p*(v i, ■ ■ ■ , v k ) = p(vi ■ ■ v k ) 
in our previous notation. 

We merely state Theorem 2, whose proof is analogous to that of Theorem 1 
but requires more cumbersome expressions. 

Theorem 2: Let n > k ^ m 5 1 and let Ei , ■ • • , E n be a set of mutually 
strongly favorable events m the second sense. Then we have 

( n —Tfv \ —1 

II [pt(n, • • •, *x +1 )] 

> n \v* m {vu 1 . 

1§>T<" <'li» 

To give an interpretation of p*(n , ■ • • , v k ), we prove the symbolic equation 
between events. 


A- = n (E n + ■■■ + E,J 

riSV 1< 

~ S (E^ £ , / , l _ In + l ) = Ck-m+ 1, 

where product means conjunction and sum means disjunction. 

To prove this, we write for a general event E„ E = 1 when E occurs, E = 0 
when E does not occur. Now if C*_ m +i = 0, then at most k — m events among 
the k given events occur, so that there exist m events such that Ex, = 0, E\ t = 0, 
E\ m = 0, thus 

•Eq + Ex 2 + ■■•+■ E\ m = 0 


Now the last disjunction is contained in D m as a factor, therefore D m = 0- 
Conversely, if D m = 0, at least one of its factors = 0, so that there exist m 
events, such that E\ x = 0, Ex, = 0, • • , E\ m = 0. Thus at most k — m events 
out of the fc given events occur and so by definition CV_„,+i = 0 Q.e.d. 

From the above it immediately follows that 

Pm(v 1 j * * ' j Vk) = Pfc—*m+l(ri t ’ * * , Vk) 

where pt-n+iCri, • • ■ , vf) is defined in the Introduction. Then Theorem 2 
may be written as 

, Pfcn )] 1 > nfo^iO-j, ■■■.n)l ^ 


or again as 








(W 


-1 



KAI-MI rttt'NO 


whprp te M _i(v! , • • ■ , v'n) denote* the probability of the occurrence of at most 
m — 1 events out of the k events E[, , • • ■ , E, t . 

Rkhahk. If in our Definitions' 2 and 3 we replace the sign ">” by the sign 
“S”, then we obtain the inequalities in Theorems 1 and 2 with the sign “>" 
replaced by The eorre,sj«)iiding .set of events tints newly defined will be 

said to he strongly mutually favorable or indifferent (in the first or second sense). 

After this modification, we ran include events with the probability 1 in our 
considerations. Also, the events need no longer he distinct and there may 
now exist implication relations between events or their conjunctions. This 
modification is useful for the following application. 

4. Consider the divisibility of a random positive integer by the set of positive 
integers. To each positive integer there corresponds an event, namely the event 
that the random positive integer is divisible by it. The enumerable set of events 

El , hi , E) , hi , ‘ ‘ • | h n I • * * 

where E n = the event of divisibility by n, with the probabilities 

111 I 

J * 2* 3* 4* *n’ 

evidently forms a set of strongly mutually favorable or indifferent events in 
the second sense. 

Again, the enumerable set of events 
where E'„ = the event of non-divisibility by n, with the probabilities 


12 3 
2 * 3 ’ 4 ’ 1 


n - 1 


evidently also forms a set of strongly mutually favorable or indifferent events 
in the second sense. 

Hence our Theorem 2 can be applied to both sets and in this way we obtain 
results which belong properly to the elementary theory of numbers. 

We shall content ourselves with indicating a few examples. 

Let (ai, ■ - * , a„) denote the least common multiple of the natural numbers 
ai, ■ • ■ , a„. Then Theorem 1, when applied to the two. sets above, gives 
respectively 

Theorem 1.1: Lei aj , • • • , a* be any positive integers, then we have, 
for k * 1, • • ■ , n - 1 


( n —--V" 


£ t n _^_y-r. 

\is»i<-"<h^k (ax,) • • ■, a, t }J 




MUTUALLY FAVORABLE EVENTS 


349 


Theorem 1.2: Also we have, 


i in 


n (i- E -+ £ _L_ 

< <n+i£n \ n£siSn+i a M n^iOn^n-n {a ( , l , a Ml j 






{®n ) ''" > R| 


1 \(vV 


n+i 


£ n (i- E i+ £ _L_ 

15 » 1 < •■<»*«■ \ HSHSnOn \a H , o„} 


- + ■•• + (- 1 )* 
A trivial corollary of Theorem 1 is 

P(12 • • • n) £ pijh • • p n . 

Correspondingly we have 




* , i j j l 




(tl) 


1 - 


^. 1 + 2 —- 1 - 
JSwfin (a Ml , a,,,} 


If we multiply by oiaj • • • a n , we get 


+ -+ (-!)’ 





A(fli, 02 , • • •, a.) £ (ox - l)(a» - 1) ■ • • (o, - 1), 

where A(a t , • • • ,a„) denotes the number of positive integers ^ oia 2 • ■ ■ a„ 
that are not divisible by any of the a, (i = 1, ■ • • , n). 

This last result, which is almost obvious here, was proved by H. Rohrbach 
and H. Heilbronn independently. 2 See also my generalization 3 (also obvious 
from the present point of view) of this result to higher dimensional sets of 
positive integers and to sets of ideals in any algebraic number field. 


s “Beweis einer zahlentheoretische Ungleichung,” Jour, ftlr Math., Vol 177 (1937), 
pp. 193-196 "On an inequality in the elementary theory of numbers,” Proc. Carnb. Phil. 
Soc , Vol, 33, (1937), pp. 207-209. 

5 "A generalization of an inequality in the elementary theory of numbers,” Jour, fiir 
Math., Vol. 183 (1941), p. 103, 



OBSERVATIONS ON ANALYSIS OF VARIANCE THEORY 

By Hilda Okimnokr 1 
Bryn Maur College 

Om> of the important problems, of theoretical statistics is the following. Let 
xi, Xt, •" Xu be thi 1 results of .V observations; by means of these results we 
want to test the hypothesis that l',(r) is the distribution of the. ifch chance 
variable /,. In that situation we often decide to choose a test function 
F(xi , zj, ■ • .r,v) and to determine the distribution of F under the above assump¬ 
tion, By means of this distribution we compute the probability of £1 g F S £ 2 
and compare this result with the observed value of F. 

Suppose there are m groups, earh of n observations on m • n chance variables 

. We may test hypotheses regarding the mn distributions of the x hr in the 
way just mentioned. In analysis of variance theory we often use as test func¬ 
tions certain quadratic forms s^. and ("variance within" and "among classes”) 
and their quotient (multiplied by m(n — 1 )/(m — 1)), usually denoted by z. 
Its distribution has been investigated by 11. A. Fisher [2] under the assumption 
that the chance variables are mutually independent and subject to the same 
normal law. "The five per cent, and one per cent points of this distribution 
have been tabulated by It, A. Fisher and are used to test, whether these two 
estimates of the same magnitude are significantly different. One gets thus a 
test of significance to test whether our sample is a raiuhm sample from a homoge¬ 
neous normal population or not? If the probability of a certain z-valuc is too 
small we shall reject the hypothesis that the sample is a random sample from a 
homogeneous normal population” (5). 

The use of Fisher’s z-test is also recommended if we may reasonably assume 
that the theoretical distributions are approximately normal. "Unless some 
rather startling lack of normality is known or suspected analysis of variance may 
he used with confidence.” This last remark can be understood by considering 
that, as we will see in detail, some of the basic results of our theory are inde¬ 
pendent of the normality of the populations It is however this assumption of 
normality which makes possible the complete and elegant solution of the problem 
of distribution obtained by R, A. Fisher. 

If it is not possible to determine the exact distribution of a tost function under 
sufficiently general assumptions we may: 

a) make simple and particular assumptions concerning the populations 

b) investigate an asymptotic solution of the problem, i.e. determine the distri¬ 
butions of the test functions for large samples,* or 

c) study the mathematical expectations and the variances of the test functions 

1 Research under a grant in aid of the American Philosophical Society. 

i My italics, 

* cf. statement (a) page 355, 


350 



ANALYSIS OP VARIANCE 


351 


for small samples under appropriately general assumptions regarding the popu¬ 
lations (this should be done independently of concepts of estimation, unbiased 
estimate etc.). 

This last procedure provides us with tests which suffice in actual practice.' 1 

It is well known that the expectations of the two forms s\/(m - 1) and 
sl/m(n — 1) are the same even if the populations are not normal, but equal each 
other (Bernoulli series). In addition we shall prove the theorem, familiar in 
case of the Lexis quotient [9], that under these conditions the expectation of their 
quotients equals unity (section 1, (b)). The next step consists in investigating 
certain inequalities characteristic of Lexis or Poisson series. The different 
criteria will be completed by the computation of the respective variances (Sec¬ 
tion 1, (c)). 

In addition to the above mentioned test functions other symmetrical test 
functions have been considered [5] In studying these we shall again assume 
general populations. It will be seen that the Lexis as well as the Poisson series 
may be characterized by equalities (instead of inequalities) (Section 2, (a)), and 
we can generalize our theorem on the expectation of the quotient (Section 2, (b)) 
to this case. Then the variances of these test functions will be investigated. 

It seems worthwhile to omit the assumption of independence of the chance 
variables and to Btudy different kinds of mutual dependence. These investiga¬ 
tions lead to interesting relations among the expectations 6 (Section 2, (c)). 
They seem to be related to Fisher’s “intraclass correlation” and to supplement 
his idea. 

Most of the results of Sections 1 and 2 can be generalized to the analysis of 
covariance (section 3). 


1. Variance within and among classes. 

(a). The test functions. Let x„ r (p = 1, ■ • ■ m) v = 1, • ■ • n) be m-n chance 
variables and put 


( 1 ) 


1 Tp _ 1 y 

(tfi - — y CLy = ^ Xp, J 


n v-i 


m n-i 


1 m a -j m i Tt 

a = — X 2 S a* • 

77171 /!■"! y —<1 771 71 p —1 


* The important paper of Irwin [5] assumes normality of the populations. H L. Rietz 
[8] computes the expectations of s\ and si under rather general assumptions for the popula¬ 
tions and considers the cases of Bernoulli, Lexis, Poisson series, but does not consider tests 
of significance; nor does he consider the symmetric test functions (section 2 of this paper) 
In later papers on our subject the assumption of normal and independent populations 
recurs. Another approach [11] in the problem of analysis of variance is to use ranks instead 
of the actual values (this has been pointed out by the referee to the author, who is very 
grateful for this comment). 

5 They generalize previous results of the author. 



352 


HILDA OEIHINGEK 


We then introduce the three quadratic forms 

( ’2) = £ 23 (V ~ o) a »* *2 = »E (a* m a)*; (x„, - a„) s , 

f p ** *» r 

with the respective ranks (degrees of freedom) 

(3) r * in# - 1, r„ « m — 1, r» «® m(n — 1). 

Then we have, 

(4) S J * s« + , r « r„ + r a , 

The m-n theoretical distributions are assumed in this section to be inde¬ 
pendent of each other. Let F„,(x) lxi the probability that x^ r g x and 

(5) a„, a J xdV h ,(x ), erj, = j (x — a» r f dV,,,(x), 


where, the integrals are SlteUjes integrals; thus the F,,,(x) may be e.g. general 
arithmetical or geometrical distributions.* 

Let us compute the mathematical expectation of the three test functions with 
respect to the m-n-dimensional distribution: 

Vu(afu)Vi*(ii*) • • • T««(x„ n ). 

(6) #[F(xu, ■ ■ • x m „)] ■* J • • • J F(x 11 , • • • Xmn) dV n (x n ) ■ • * dV*,(%**). 


We have then 


(7) 

( 8 ) 
(9) 



~ E&V + j 2S(«V " «)*, 
mn run — t 

-- 22<V + - 1 --' ■ nX(a„ - «)*, 
mn m — 1 

~ 22<v + -> -~r< 22(«*, ~ «„)*■ 

mn * m(n - 1) * 


From these equalities we deduce: 

1. If the m-n theoretical mean values a?, are all equal (Bernoulli series), then the 
expectations in (6), (7), (8) are equal ; t.e. 



2. If the <v are equal "by rows” but differ from row to row (Lexis series), i.e. 
<*,,► = atp but a. Then 


6 T ? ,(x) is a monotone non-decreasing function. Hence it has at most a denumerable set 
of ordinary jump discontinuities; at such a point it is continuous to the right but not to 
the left. Moreover it possesses a finite derivative s,,,(x) almost everywhere. 



ANALYSIS OF VARIANCE 


353 


(11) - -EJ] - r 1} s K - af > 0, 

Lto — 1 ffln - 1J (m — l)(mn - 1) T - M ’ 

(12) E L F-- lS ~l = ——, S («m ~ «)* > 0. 

L»i — 1 m(«. — 1)J m — 1 " 

3. If the a,,, are equal ‘‘by columns" but differ from column to column (Poisson 
series), then a^, — a y ; = a and 

(13) E p \-A-- -i-1 = - -JO— E (& - «)* < 0, 

L m — 1 mn — 1J mn — 1 , 

[A “ T^ - is l = “) 2 < 0. 

\jn — 1 m(n — 1)J n — l. 


(14) 


E, 


In the Lexis theory 7 we speak of normal, supernormal or subnormal dispersion 


depending on whether the observed value of 


m — 1 
2 

o 

that of-- and we usually consider the quotient 

mn — 1 * 

(15) L=—L-/—L-. 

m — 1 / mn — 1 


is equal, greater or less than 


In analysis of variance theory we usually compare s\/(m — 1) (variance among 
rows) with sh/m{n — 1) (variance within rows) and introduce the quotient 

(16) V = • 

m — 1 / m(n — 1) 

It follows from (4). If L = 1 then F | 1 and conversely. We may therefore 
speak of normal or non-normal dispersion with respect either to L or to V. 

The results given by equations (10)-(14) can be expressed as follows: If the 
m-n theoretical distributions are all equal the mathematical expectation of s 1 /r, of 
sjr a and of s\,/r w are equal. In the case of a Lexis senes the expectation of sjr a 
is greater than s 2 /r and greater than s 2 m /r w and m the case of a Poisson series the 
opposite is true. 

We generally use these facts in order to make inferences about the unknown 
populations from the observed values of our test functions V^ y (x). If e.g., the ob¬ 
served value of sl/r a is “significantly” 8 greater than that of s“/r we may assume 
that the theoretical distributions form a Lexis series. But of course such a 
significant deviation can also be explained by quite different assumptions re¬ 
garding the populations (see Section 2, (c)). 

(b). Mathematical expectation of the quotient of the test functions. We are going 
to prove in this section a theorem of some mathematical interest. This theorem 
is a generalization of an analogous theorem in the Lexis theory [9]. 


? The relation between these considerations and the Lexis theory will be dealt with in 
another paper. 

• The meaning of the word “significantly” has of course still to be explained. 



354 


HILDA GEIHINGEB 


We have seen (10) that the mathematical expectations, defined by (6), of the 
three test functions 

2 2 2 
_ _ S c/ _ q;/ __ &u> 

mn^~l ’ m - 1 ’ mOT^T) ’ 

are equal if the m-n populations are equal (i.e. have identical distributions). We 
will show that even in this case 

(17) * (I) “ 1 - E (f) - '• 

Let us put m-n — N, and let the N chance variables be arranged in a one-dimen¬ 
sional sequence. As S' and 5 are of second degree in the x, (v ~ 1, 2, ■ ■ - N) 
we may write 

S' - 5 = A + + £ C,x\ + D D, P x y x p 

where the A, B,,C, and A., are constants. Now form the expectation, defined 
by (6), of (S' — S) under the assumption that the N populations are equal 
V r (x) — V(x) (v — \ N) Denoting by a and a- 2 the mean value and vari¬ 
ance of F(x) and putting 25, = B, 2(7, = (7, 2 D rp = D we find 

E(S' - S) = A + B<x + C(<r* + a 2 ) + Da = 0. 

And as this equality holds for an arbitrary distribution V(x), we deduce that 
A—B — C^D — 0 Let us then compute under the same assumption the 
expectation of (S' — S)/S. Now the expectations of 1/5, x,/S, xl/S, xjc p /S, 
take the place of the expectations of 1, x,, x\, XyX„. But these new expecta¬ 
tions arc also independent of the index v, because of the equality of the N popula¬ 
tions and the symmetry of S in the N variables **,•■•**. Hence we may write 

Ks)-*" E (!)-'" * (*)-»• *(¥)-- 

and we find 

E (^-~) - E - l) = Ay* + + Caz + Dp, = 0, 

because A=5 = C = Z> = 0. Hence 77(575) = 1. 

We may prove in the same way that E(S"/S) = 1. 

Wc have however proved (17) only under the assumption that all the N 
populations are equal, whereas (10) is true under the mere hypothesis that the 
mean values of the populations V,(x) are the same. 

(c). The variances of the test functions. The distribution of our test functions 
and of their quotients V or L have been determined and tabulated by R. A. 
Fisher under the hypothesis that the m-n chance variables are independent and 
obey the same normal Gaussian law. Consequently by means of Fisher's distri- 



ANALYSIS OF VARIANCE 


355 


bution we can test only the hypothesis that the theoretical populations have 
both these properties 

If m a statistical problem it is not possible to determine the exact distributions 
of the test functions under sufficiently general assumptions regarding the popula¬ 
tions, one of the following proceduies is frequently used: 

a) one tries to find an asymptotic solution of the problem, i.e to determine the 
distribution of the test functions in question for large samples. The distribution 
of the analysis of variance quotient, as n tends to infinity, has been established 
by W. G. Madow [6]. The same problem for the Lexis quotient was solved as 
early as 1873 by Helmert [4]. As m tends to infinity the limiting distribution 
is a Gaussian distribution, which follows from general theorems of v Mises [7]. 

b) For small samples, i.e. if m and n are finite we may determine the expecta¬ 
tions and the variances of the test functions for appropriately general popula¬ 
tions and establish in this way a test of significance 

In this section we shall compute the variances of our test functions. Let us 
first assume arbitrary but equal populations F v (a:) = V(x) and denote by M, 
the ith moment about the mean of V(x). 


(18) 


1, = J (x — ocY dV(x), 
a ~ J jcdF(cc), M2 = a 2 


Then we find immediately the variance of S = 
formula for the variance of a sample variance 


mn — 1 


<*'« 1,2, 


using a well-known 


(19) Var 


mn — 1 


= Var 


/ ZZ(y - af \ _ J_ 
\ mn — 1 


mn 


mn - 3 2 

M 4 — -- M 2 

mn — 1 


If we need the analogous variance in case of different populations we let 
t 2 — X) ( y„ — bf where b = - (y 1 + ■ ■ ■ + y r ) 


p-1 


and let V„(y), (p = 1, • • • r), be r populations, and 

ftp = [ V dV„{y), \ ^2 Pp = P, 

J r p-i 

' j (y-p,YdV P (y) = p\ p> , (i-1,2, 

Then the following formula may be used: 


,P = 1,2, r),M* rt = dp ’ 


Var (f 2 ) 


( 20 ) 




- vj 


+ *—± SAP, - /»)* + 4 i - 3) 2 +il A . 

T p-1 p—l ~ P< T 



356 


HILDA. GEIIUNGER 


We may check (20) by putting the V p {y) all equal to V(y) and find 

(200 Var «*) = r -“- 1 [(r - l) w - (r - 3> 4 ], 

r 

in accordance with (19). 

In order to determine the variance of a\ by means of these formulae wo con- 

1 y-A 

aider -- zl (fin — a) 5 aa a sample variance. The n distributions in the nth row 
are V^a), V&(x), • • • F„ n (.c). Or, if we assume that they are all equal, simply 

l 

V(x) - F(a>)- Let us put - x„, — z h , and V(x hr ) = W(z^,). and denote by 

71 

W(a H ) the distribution of the average of the elements in the ^th row: 

Wia,) = / • • • / dV'W i) dV'(z A ) • * • ciriz^V'ia, - z A -- i). 

There is such a distribution for each row, and we have to find the variance of 
Yin ( a n ~ a Y with respect to the combination of these m distributions. In order 
to be able to apply (20') we need the second and fourth moments of these 
distributions. We have for the mean value «' of W(aJ: 

a' =* n-(mean value of F') — n-- =» a 

n 

2 

and for the variance gj of W(a„) : . We still need m • By repeated use 

n 

of the formula 

J J [(*i - «i) + Oti ~ Oj)]‘ dF(*0 dV(3k) 

=* / (*i - ai) 4 dV(xi) +J(x i - ch) 4 dV(3H) 

+ 6 J (xi aif dV(x j) J (x t — ch) 3 dV{xt), 

and of the fact that W(a H ) is simply the distribution of the sum of n variables 
Zji, we get: 

Mi' = “ (nMi + 6 M 2 ) = ~ (M< + 3(n - 1)M*) 

where M< and M 2 are the values introduced in (18). 

We now apply (20') and get 

Var [2*a„ - a) 5 ] = [( TO - i)^ _ ( m - 3)^]. 

m 

and substituting the values of /i 2 and , we find by an easy computation the 
final result: 




ANALYSIS OF VARIANCE 


357 


( 21 ) 



2(a„ — a)' 


= ~( M< - 3M|) + Ml 
mn m, — 1 


If we compare this last formula with (19) we see that the right side in (21) 
is of order 1/m, whereas that in (19) is of order 1 /mn. Therefore, for sufficiently 
large values of n, s 2 /r will be “more exact" than s l/r a . In some presentations 
of the Lexis theory it is implied that the value s\/r a is to be compared with the 
theoretical or exact value s 2 /r; we may see a certain justification for this idea in 
the result just mentioned. This may lead us also to use s /r as an unbiased 
estimate of the unknown population variance if av» = a (see ( 7 ) and ( 8 )). 

By means of the simple formulae (19) and (21) we can now easily test whether 
the values of s /r and sl/r a whose expectations are equal in case of equal popu¬ 
lations differ significantly from each other. Of course we must compute as usual 
approximate values of M 2 and M 4 from the observations If n is comparatively 


large—as it usually is e.g. in the Lexis theory—only the term-- M 2 will be 

m — 1 

significant. If the hypothetical population is Gaussian (Mi = 3M 2 ) the right 

side of (21) reduces to-- M 2 and that of (19) to-; hence these vari- 

m — 1 mn — 1 

ances are m the ratio of — / -, as one might expect. 

Ta / r 


2. Symmetric Test Functions. 

(a). New equalities for Lexis and Poisson series. In Section 1 , starting with the 
formula a 2 = si + si we used the test functions s 2 /r, s 2 /r„ , s 2 ,/r w . This implied 
a difference between rows and columns, which is often justified, e.g. in the Lexis 
theory. The following decomposition of s 2 is symmetric with respect to rows 
and columns. Let 


1 " 1 ^' 

^"1 XpV' ~ ftp ) ^ (Efiv == 0/f j 


n „-i 


m u-i 


(1) 

1 m n i m 1 n 

— 23 23 a}.* = - 53 at. = - 2 dv = a, 

mn „-i ,-1 m n ,-1 

and 

s = 22(p — 0 ) , s a = nS({i^ a) , 5u> ~ SS(a?^^ af) 

(2) 

S 2 - 2S(^v — an — dp "t“ o), s 0 — mS(m — o), = S2(a;^, d./) 

with the respective ranks 

r = mn — 1, r a = m — 1, r w = m(n — 1), 


(3) 


R — (m — l)(n — 1), r a — n— 1, r„ = n(m — 1). 


Then 

( 5 ) 


s 2 = s\ + § 2 + S 2 = si + si = si + si 



358 


HILDA GEIRINGER 


anil 


(6) r = r a + f a + R = r n -f r a = f a + f„ . 

We find the. expectations of these forma under the assumptions, of arbitrary 
populations V M ,(i) which are independent and different from each other. We 
then specialize for Bernouilli series, Lexis and Poisson series of populations 
respectively. Denoting by a*, and a], the mean value and variance of 
and by 


(6) a„ =* - 2C <V i «' — * X) <V , a — - — - 2C «,. 

?l ff 7)t ft 77% 7% 


we find for the expected values defined in (6) Section 1: 
E~ 


mn — 1 

„i 


1 22<v + ------ SS(<v - «)*, 

* mn — 1 


mn 


(7) 


E [~1 = -l- SSffJ, + - 1 - 1 nS(«„ - a) s , 
L_m — lj mn m — 1 

E \ -A.] =-~L + - 1 ms(s, - a)\ 

Ln — lj mn n - 1 


E [ 7 --4-TX*1 = - 2S4 + ——L - -. - a, - a, + «)*, 

L(m — l)(?i - 1) J mn (m — l)(n — 1) 

'[■ 


E 


m{n — 1) J mn 

S 5 

°w 

_n(m - 


- 220 - 2 , + 1 


1 ). 


m(n — 1) 
1 


22(£V - er„)\ 


- -i- 22<rJ, + 22(cv - a,) 2 , 

mn n(m — 1) 


In the Bernouilli case which as far as the author knows is the only one which 
has been considered in this connection [5], we get the wellknown result: 


E, 


( 8 ) 


mn 


_ x, r £ i _ * r ai i _ v r s* i 

hB L m(n - 1)J Lb Ln(m - 1)J ““ L(m - 1 )(n - 1). ' 


Now let us assume a Lexis series, with 


(9) <v = a„ ; 

Then (7) reduces to 


, 13 

a„ p* a; a, = a, a,,, = <r„ 




ANALYSIS OF VARIANCE 


359 


El 





= - + 
m 



2(a„ 


\i 
a) . 


From these formulae we deduce—besides the inequalities (11), (12) of Section 1, 
and the corresponding formulae where the role of rows and columns is inter¬ 
changed—the further inequalities. 


(ID 


El 


- 2 -| r _2 “| r -2 

> El -r > El . 

\_m — 1J \_n(m — 1)J _n — 1_ 


But there are also characteristic equalities, namely 

<12) * [,4i] - [gn ix-n ] - Urh)] ■ 

These equalities 9 seem often to be more appropriate than the usual inequalities 
in testing the hypothesis of a Lexis series 

Let us finally consider the Poisson case which is very often neglected. There 
we have. 


(13) 


Oifit . — OLg , OCp 5*- Q!, 


2 


= CT, 


Then—beside the inequalities (13), (14) of Section 1 and the corresponding 
ones where the role of rows and columns is interchanged—we find the new 
inequality: 


(14) 




[a, — a) 1 < 0 , 


which of course corresponds to the Lexis inequality (11). The characteristic 
equalities are now: 


(15) 


E, 


r 2 “I 
_TO — 1_ 


Ep 


S 2 


L(m - l)(n 


_ p _ 

— 1)_ F \_n(m — 1). 


These equalities (12) and (15) can be used in testing the hypothesis of Lexis or 
Poisson series respectively in the same way as the equalities (9) for the Ber- 
nouilli case. We shall deal with the variances of these test functions in (d) of 
this section. 

(b). Mathematical expectations of the quotients of certain test functions. We 

f 

have seen that in case of a Lexis-Series the expectations of --—-, of 

n — 1 

o2 2 

- -and of — 7 — -r are equal. We will show that even in this case 

(m - l)(n — 1) m(n - 1) 


» See [10] pp, 81-90 for proofs of these inequalities for the case of normal populations. 



300 


HILDA QKlRINGEIt 


( 16 ) 


[ 


« $0 / 8m 

' L Jl ~ 1 / 77l(?l — 


1)J 


min — 1), 
s 

Ei ' 


/ _ lS 

/ (m - 1) 

L-i/ 

»r _. / 4 

W 'L(to — l)(n - 1)/ m(n — 1)J 


1)(4- 1)J 

(to ~ l)(n - 1)_ 


- 1, 
- 1, 

SSI 1 

1 1 


-1. 


s »s 

Let us write for the moment: = 7* and , - -- 

n—1 (772. — l)(n ~ 1) 

T and T are of second degree in the x,,, we may write: 


= T. 


As both 


T - t - a + E /w + E c H ,xl + 

Jltf 


where the A, B, C, D are constants. The lost sum contains %-mn(mn — 1) 
terms and net both ju =» and i = j hold. Compute the expectation of T ~ T 
with respect to populations which form a Lexis series F^,(x) = F„(j). Denote 
by , <r\ tire respective mean values and variances. We then have because 
of (11): 

0 - E,{T - T] - A + E «„ E R„ 

M t 

+ E (4 ■+*«J) E^+ E «si««> E 

j* ** Kt*xt «.j 

or introducing E B** =* ^ 1 E “ C* ; E ** A, IJ1S we (get: 

* V it) 

o-2U?r- h- a + E«f^ + E(4 + «J)c^ + E ««• 


As this equality is exact for an arbitrary set of V w (£) we deduce that A = 0, 

■B(i — 0, Cp = 0> D hll ij = 0. 

Let us now compute under the same assumption the expectation of (T — T) / T. 
Here the expectations of 1/T, a>/T etc. will take the place of the expectations 
of 1, , ■ • • . But these new expectations will not depend on the index v 

(index within the row) because the populations are the same within each row 
and because of the symmetry of T in the m: n variables x „,. Hence we can put 



and we get 



E (V) = 1,, E (fy = l Ml E , etc. 

- 1) - au + E is, + E lc, + E - o. 


because all the coefficients are equal to zero. Our theorem is thus proved. The 
same conclusion holds if the denominator—without being symmetric in all the 



ANALYSIS OF VARIANCE 


361 


m ■ n variables—does not depend on the row index. And as this last property 
holds for s w the expectations (16) are all shown to be equal to one. 

Analogous relations are valid for Poisson series. 

(c). Non-independent populations. We omit in this section the assumption of 
independence of the m-n populations but assume the theoretical population to 
be a general m-n -variate distribution: 


(17') F(x u , Xu , • • • x mn ). 

From V(xu , Xn , • • • x mn ) we derive the mn one-dimensional distributions V p ,(x) 
(y = 1, ■ • ■ m; v = 1 , ■ • • n) by letting all the variables except x p , tend to + °o , 
because F„,,(x) is the probability that x pv 5 x regardless of the values of the 
other variables. In a similar way we derive the \mn(mn — 1) two dimensional 
distributions F M i l!M ,j(a:, y), that is the probability that x Pin iS x and x PiPt S y. 
We get this distribution from (17') as all the variables with the exception of x nri 
and x„ aVj tend to + °°. We denote as before by a p , and a\ r the expectation of x p , 
and (Xjiv — <v) 2 respectively, But the expectation of (x plH — a hlVl ) (x^ 2 » s — a Pi ,f) 
which was zero in case of the independence of x nri and x P}tl may now differ 
from zero Denote by & the expectation with respect to (17'). Then: 


(17) 


” J J (Xp l i- 1 )(x^j»ij dF(Xn, * * • X mn ) 

= J J (® “ein)(?/ dF/m., ,» at ij(xi/) = R Pl — R p 


Let us first deduce a general formula for the expectation of a sample variance 
in the case of dependent populations. Let P(y i, ■ • • y r ) be the distribution of r 
chance variables y x , • - ■ y T which have the average b. Denoting by the ex¬ 
pectation of y p with respect to P, by 0 the average of the 0 P , by t 2 p the expecta¬ 
tion of {y„ — 0 P ) 2 by R t] that of (y t — 0f)iv, — 0,) we find, without difficulty, 
for the expectation of the sample variance 

Exp. ^ g ( y P 6) 2 J 


(18) 


= r / " ’ / ^ yi “ + ■ " + _ > ‘" 



±T 1 + 
0-1 


- E (0, - ef 

T P 


2 

r 2 


E Ro- 

* < 7 


Let us apply this result in the computation of the expectations of our test func¬ 
tions. It is not difficult to compute them in the general case of different mean 
values and variances But we restrict ourselves to the consideration of certain 
particular cases. Take first the case where all the m ■ n mean values a„ v are equal 



362 


HILDA GKIRINQER 


to each other and likewise the m~n variances and the \mn{mn — 1) covariances. 
Denote these magnitudes by ct, a 1 and It, respectively, we see from (18) that: 


(19) 




s(- A.A A )_«(-- . •*') 

\w»(n - 1)/ \n(m - 1)/ \{m - l)(n ~ 1)/ 


1 

= ff 


- R. 


We, have thus obtained the result lhal in the case of dependent populations , just 
described, the. expectations of the six different lest functions arc still the. same. 

Of course we may assume many other particular kinds of mutual dependence 
of the populations. The following assumption seems to be appropriate for 
problems where rows and columns play a different role: We consider dependence 
only within each row, that means we assume only the variables , • • • x„ n 

as mutually dependent. The distribution (16) has then the following form: 

(20) V (3Ul , * * ' Inn) “ T 1 (^ 11 , * 4 * 3U n ) 1^2(3:21, * ' * Xjn) ' ' ’ ~V m(:C,nt , ' • ' X m „). 

In the usual way we derive the m-n one dimensional distributions V^{x) and 
the %mn(mn — 1) two-dimensional distributions V Hin , n , t (x, y ), If mi s* w 
such a two-dimensional distribution reduces to the, product of the respective one- 
dimensional distributions. Only the \mn{n — 1) bivariate distributions derived 
from one and the same T^(a: Ml , ■ ■ • will not reduce in this way. 

Denoting again by $ the expectation with respect to V(xn , ■ * > x m „) we find: 

£[(*W - <w)(**ty “ ««u)] = 0 Ml ^ M2 

( 21 ) , , , 

= R\, Mi ~ M 2 and j?sj. 

Applying now formula (18) in the computation of the expectations of s 2 , si and 
si we find: 

«E 2 (*,. - «>’] - 2 2 1. 


+ 22 («» -«)'-“ 2 2 «<:>, 

mn , < y 

(22) «I2 2 fc. - «.)•] = 2 2 4 

+ 22 («» - «,)' - 1 2 2 flif, 

n p-i <y 

«E 2 («. - -) 1 ] - ~ 2 2 v 

+ n £ («, - «)' + £ £ Rif. 

mn p-i i<; 



ANALYSIS OF VARIANCE 


363 


Let us now suppose that all the m-n distributions are equal to each other, or, at 
least, that' 

(23) otft v — a. 

This assumption, which is characterized by (21), is, of course, different from 
the one which leads us to (19). We find now by means of (22), if we set 


(24) 


E E R*i = R 


<*—1 *< 7 


and 


2 

mn(mn — 1) 


R = R, 




2 ^ _ mn(n - l) R 

mn — 1 mn — 1 


Assuming R > 0 (positive average correlation) we may compare this result 
with (11) Section 1‘. The term on the right side of (24) is also of the same order 
of magnitude as that in (11) —For negative R the teim on the right side of (24) 
is negative and the equation may be compared with (13) Section 1. We see 
that for the test functions s /r and s\/r a “■-positive , ( negative) average correlation 
within rows” has the same effect as “Lexis ( Poisson) Series" of 'populations. 
Consider now the test functions s 2 and S 2 . We find 

(25) 6[Sa 2 ] = 6(22(d„ - a) 2 ] = 22 o\, + m2(&, - a) 2 - ~R, 

TTltl Hill 


and 


(250 


6[S 2 ] = 6[22(:q 1 ,, — — a, + a) ] = 


■*i - (w ~ 1)(n ~ 1) -22cr! 


mn 


+ 22(a^ — a h — a, + a Y ~~ 


2 {m - 1) 


R. 


mn 


Assuming (23) we get: 

<*> «[^i - 

and if R > 0: 

(260 > S [nli^T)] > s [-ph]' 

The first equality is analogous to (11) and (14) of Section 2 for positive or nega¬ 
tive R respectively. 10 We also get under the assumption (23) 

<»> $ [A] ~ s [ (- - iV-d ] ~ 6 [ct1- 

10 1 have studied in another paper the combination of Lexis series and “positive correla¬ 
tion within rows.” It turns out that the two kinds of positive effects reinforce each other. 
The same is true for “negative correlation” and Poisson senes See [31. 



364 


HILDA GEIIUNQEH 


These are the same equations as (12) Section 2, and they are true for either sign 
of R. Hence they provide no way to decide between Lexis series and correlated 
populations. But computing the expectations of the magnitudes which occur 
in (15) Section 2 we find from (22), (25) and (25') 

A 


$ 


(28) 


si 

jn — 1_ 


<r* + (» - DR, 


& 


' i l “ 

_n(m — 1). 


aw O' 


A s 1 1 

L(« - i)(» - i). 


= I T — R. 


And hence we may say: 

If the observed value of si/{in — 1) is greater than that of 5 l,/n{nt — 1) this can be 
explained either by the assumption of a Lexis series or a positive correlation within 
rows; but their equality indicate , o Penes cm series ; and if the first is smaller than 
the second we may assume negative correlation. 

In the same way we may explain 


' & "[ > r_ 

_n(m — l)J ob «™i L(w — l)(n — 1)J 0 


l)(n 1) - Job#oamxl 

either by positive correlation or by Lexis series; whereas the equality indicates 
a Poisson series and the sign < indicates negative correlation. 

(d). The variances of the test functions. We have still to find the variance of 
our test functions. Let us compute the variance of 


22(x„ 


S, -f a) s 


with respect to the m-n dimensional distribution F(xii)F(xn) 
Let us put 

(29) x„, - - a, + o =» j, 

then we see that the average of the y^, equals zero 

1 


V{x nn ). 


§ = — — a -— nSa„ 

mn mn 


mn 


mLa, + a — 0, 


and 


& - 22(x„, - o M - a, + af = 22fo„ - y)\ 


Each y„, is a linear function of the x*, e.g. 
(m — l)(n — 1) 


Vu = x u 


mn 


m — 1 y'' n - If 1 ft, f 

“ h - -zzr- 22*« + —2222 


mn ~i 


mn ,-j 


mn » 


*</ 


— xu A* -(- Xj 2 Xiy + A» 2 *a 4" hi 2 21 *»i> 

! l 3 1 


(30) 



ANALYSIS OF VARIANCE 


365 


Using the same notations as in Section 1 (c) we find, because of the independence 
of each chance variable 


(310 


Var (j/ii) = Ai <x 2 -f- A 2 (n — l)cr 2 + A \(m — l)cr 2 


+ \\{m - l)(n - IV 2 = 


(m — 1 )(n - 1 ) 2 

•---— (T 

mn 


and we find the same result for each y „„ : 

(31) („,.)■(" 

mn 

in agreement with the fourth line of (7) of this section. We still need M 4 the 
fourth moment about the mean of y^, which we can compute from the fourth 
moment of a sum. We find 


(32) Ml = AM* + 6 B<x\ 
and we have 

A — A 4 + (n — l)Xj -f- (m — 1)^3 + (m l)(n — 1)^4 

(33) = zJl ( ro 2 -3m + 3)(n 2 - 3n + 3), 


and 

B = Al{Aj(n - 1) + - 1) + Xt{m - 1 )(n - 1)) 

+ Xj(n - l)(^Xj(n - 2) + Xl(m - 1) + \l(m - 1 )(n - 1)} 

+ X 2 (m — 1) {— 2) + X 2 (w — 1)(« — 1)) 

+ j\}(m - 1 )(n - l)[(m - 1 )(n - 1) - 1]. 

If we introduce the values of Xi, X 2 , X 3 , A< we find 

toVB = (m - l) 3 (n - 1 )\m + n) + (m - 1 f(n - l) 2 (m + n - 2) 

( 34 ) + $(m — l)(n — l)[(m - 1) 3 (« — 2) + (n — l) 3 (m — 2) 

+ (mn — m — n)} 

this expression as well as that of A may be easily computed for different values 
of m and n. 

If m and n are large, B is of order-1— ) from (31)-(34) we see that in this 

771 71 

case 1 r ' 2 is approximately equal to <r 2 and M« to M 4 . 

Using now (18') we find finally 


Yar (22 (x^, — — a, a) 2 ) 


mn — 1 


{(mn — 1)m( - (mn - 3)cr' 4 } 


mn 



HILDA GEIRINGBlt 


360 


where M \ and a r ‘ are the expressions just computed. If we compare the varian ces 
of the test functions &\/(m — 1) and >S’ J /(»i — l)(n — 1) we, see that whereas 
the variance of the first expression is of order 1/m that of the second is of order 
\/mn. Hence for large values of n the latter expression is more exact, than the 
former (see the analogous remark Section 1 (c)). A similar statement can be 
made if s«/(n — 1) takes the place of «’/(m — 1). 


3. Bivariate distributions. Analysis of covariance. 


(a). Problem. Suppose m persons arc throwing two dice, n times; we observe 
the respective numbers on each die in those m-n trials. Or we. observe on m 
groups of n persons the color of the hair and of the eyes. Or else we state for 
n years the yield of wheat (in bushels) per acre and the production cost (per 
bushel) for m farms; etc. 

We consider m-n pairs of numbers , y„ f . Let V H ,(x, y) n be the 
probability that is x and y», S y, V„,(x, + °o ) = Os), 7„(+ », y) - 
V^'(y) and introduce the. following moan values and variances 

(1) JI xdV>,(x,y) = JI ydV h ,(x,y ) = ft,,, 

(2) JI (x - o>)* dV M ,(x, y) = // (y - fi^Y dV^xy) * i>, 

(3) // (* — a„r)(y - ft,„) dv^ix, y) ** 7 * 0 - 


1 V 1 = ««> 1 XT' - Iv'V' 

" 2-J ~ X-J a H> — <Xr I ~ 2-JjLl «*»- 33 “ 

n r m „ mn 

(4) 

" 23 ft ,* ~ ft < 1 ~ 23 Pm' ~ ft* 1 ~~ 23 23 ™ fi 

n , m ¥ mn 

Let us compute the mathematical expectations of certain test functions with 
respect to the 2mn-dimensional distributions 

Vn(xn , 2 / 11 ) Fis(xi 5 , y lt ) V wn(^mn > 2/mn), Let 

£[F(xu , 2/11 , ■ • ‘ » Vrnn) ] 

(5) r f 

s J * ‘ j f(*U > * * Vmn) ^ j Vl\) * * * Vmn(,^rnn 2/mn) 


Qty f 

u In the particular caeo where V^{x, y) haa everywhere a derivative wo can use the 

ox oy 

0 l V f 

two dimensional density v„(x, y) ■» -—~ and the ono-dimonsional densities 

01 oy 

vi 5 ,’ <m) “ J v„(xy) dy; vi)’(y) - J v„(x, y) dz 


7 


in 


( 1 ) 



(*) das, 


7'i’ 



pji’fi/) dy. 


and rye have 



ANALYSIS OF VARIANCE 


367 


1 

a u = - 
n 

S x tt<’ i 
¥ 

dy - ~~ ^ J Xpy y 

m n 

— - 7 j 7 j Xav y 

mn 

} - 1 

y»v > 

Sy = ^ 1 y^iv f 

b = EEyp-, 

n 

» 

7TI ^ 

mn 

22 (x,,. 

-a)\ 

«a = nS(a >1 — a) 2 , 

= <Xp) 


S 2 = 'Zh{x ll , — — a, + a) 2 , si = mZ(a„ — a) 2 , s 2 — 22(x )J „ — d v ) 

t 2 = 22 (y, r - b)\ tl = nZ{b, - b ) 2 , tl = 22 ( 2 /,, - 6„) 2 

T 2 = 22(y„, -b„- a, + b)\ f a = mZ(b, - bf, tl = 22 (y„, - b,Y, 


We then have 12 

(5') F[G(x n , ■ Xmn)] = f G(x 11 ■ ■ ■ x mn ) • • • dv£l(x mrt ). 

In analogy with previous notations we introduce 

( 6 ) 

and 

(7) 

and 

c = 22(x J1 , — a)( y,,, - b ), C = S2(x^ - a„ - a, + a)(v^ -b„-b y + b) 

(8) c a — nZ{a, l — a)(6^ — 6) c u = 22(x^„ — — b t ,) 

c 0 = m2(d„ — a)(5, — 6) c u = 22(x„,. — 5,,){y, ,► — b y ) 

we then have ..... 

s 2 = <S 2 + s! + a 2 = si + s a = §„ ■+- So, 

(9) < 2 = T~ + £ + tl = tl + £-£ + £, 

C == C Ca + Ca = C a “I” Cuj C a "4” j 

and corresponding relations for the ranks of these quadratic fox ms. We find 
for the expectations of these test functions, in analogy with previously investi¬ 
gated formulae: 

E f J— 1 = — 22r 2 , + —L -: 22(^ - d) 2 , 

1 _mn — lj mn mn — 1 

E I" —-1 = — 22r 2 , + —nS(ft, - P)\ 

Lot — lj mn m — 1 


and 


* [^-J = L ^ + *r=ri 22(tv 

E [m^l] = nm 


P), 


22y^, + 


to — 1 


7lZ(ot y XX/ \p y v d)j 


u It may be mentioned that the problem considered in this section of mn bivariate 
distribution »„.(*, V) Constitutes, of course, only a particular ease of dependence (see 
section 2, (c)) for a 2mn dimensional population v(*n , yn , ®u , Vu i ' ‘ *"» < V»»)- 





HILDA GEIHTNGER 


368 

1) If all the a„, equal each other , or all the equal each other, wo find: 

*[=Hl - *■ - J - *U" i>] ■ 

" S ‘ ” E ‘ [n -l] " [nfm ~ ll ” SS SSlv - 

These formulae provide us with unbiased estimates of 2J2y „,, 

2) The otfir arc equal within each row but differ from row to row, (Lexis) 
yi a; & r = a whereas the fi„ may have, arbitrary values, then 

(i3) - *4,^ t) ] - *{(»-4. 

The same equalities arc valid for arbitrary a?, if the /?„, — fi h ; ft, — ft. Our 
new equalities may be of some interest because inequalities analogous to those 
of the Lexis case cannot be proved for covariances. If the observed values of 
the expressions in (13) are significantly different we may conclude that neither 
the a„, nor the fi„ form a Lexis series. A judgment of the test (13) might be 
based on the investigation of its power function. But besides we have the 
equalities (12) and analogous equalities containing ll , T 1 and £ . 

3) If either a», ** a,, a, rA a, «„ =» a, 

or ftpp *= fir , fir 7 * ft, fin ^ fi* 

We have the new equalities 

(H) e{^~] - &[„- ( -5- =1) ] - *-{j^zr^= „]• 

and there are no inequalities analogous to the inequalities (14) of Section 2, and 
(13), (14) of Section 1. 

Most of the investigations of Sections 1 and 2 can be generalized for this two 
dimensional problem. 

BIBLIOGRAPHY 

[I] R. A. Fisher, Statistical Methods for Research Workers, 6th ed., p. 214 fl 

[2] R. A. Fishes, “Applications of ‘Student's’ distributions,” Melron, Vol. 6 (1926), 

pp. 90-104. 

[3] H. Gbirinoer, “A now explanation of non-normal dispersion in the Lexis theory,” 

Economelrica, Vol. 10 (1942), pp, 53-60. 

[4] F. R. IIelmert, Zeils. fUr Math, und Physik, Vol. 21 (1876), p. 192-218, 

(6] I. 0. Irwin, ‘'Mathematical theorems involved in the analysis of variance,” Jour. 
Roy. Siat. Soo., Vol 94 (1931), pp. 284-300. 

[6] W. G. Madow, “Limiting distributions of quadratic and bilinear formB," Annals of 

Math. Slat,, Vol, 11 (1940), pp. 125-147. 

[7] R. V. Mises, "Theorie des probabilites. Fondements et applications,” Annales de 

Vlnstitut Poincare, (1931), pp. 137-190. 



ANALYSIS OF VARIANCE 


369 


[8] H. L. Rietz, “On the Lexis theory and the analysis of variance,’' Bull Am. Math 

Soc , (1932), pp 731 ff. 

[9] A. A TscHUPitow, Skandmavisk Aktuanetidsknft, Vol. 6 (1918), 

[10] A Wald, Lectures on the Analysis of Variance and Covariance, Columbia University, 

1941 

[11] Milton Friedman, “The use of ranks to avoid the assumption of normality,” Jour. 

Amer Slat Assn., Vol. 32 (1937), pp. 675-701. 




THE ANNALS 
of 

MATHEMATICAL 
STATISTICS 

(founded bt h. o. cabtbk) 

The Official Journal of the Institute 
of Mathematical Statistics 



Contents 


On the Ratio of the Variances of Two Normal Populations.. HbnSt '**** 

Scitbs'fiS . , t ,, 37t 

Setting of Tolerance Limits "When the Sampie is Large. Abra¬ 
ham Wald....... . . . g 8 Q 

Statistical Prediction with Special Reference to the Problem of 

Tolerance Limits. S. S. Wilks ,.., .. . . t , , 400 

Generalized Poisson Distribution. Franklin E. Satterthwaite ,,', 410 
PifjSonstniciaonof Orthogonal Latin Squares. Hbnry b. Mann 418 
A Method of Determining Explicitly the Coefficients of the 'Char¬ 
acteristic .Equation, P, A. SamxjBlson. ... 424 


Notes; 

A Note on the Theory of Moment Generating Functions. J. H,. 

...... 430 

On the Power Fuuotion of the Analysis of Variance Text. Abbabam 

W'Abd .... .. . 434 

A Note on the Estimation of Some Mean Values for a Bivariate Diai 1 

tnbution, Edwabd Paulson ....;,, ■... 440 ' 

Significance Levels for the Ratio of the Mean Square Successive Dif¬ 
ference to the Variance, B.I, Habt. 5 ...446 

A Correction. M, A. Girbhick......447 

Report of the Phtigbikeepaie Meeting.., ■ asm 

Abstracts of Papers.... 


.. Xin, No. 4 — December, 1948 













ON THE RATIO OF THE VARIANCES OF TWO NORMAL POPULATIONS 

By Henry Scheff£ 

Princeton University 

CONTENTS 

Page 


1. Introduction and summary .. ... 371 

Part I. Significance tests and confidence intervals based on the F-distribution 

2. The E-distribution . ' . . . ,, ,, .372 

3. Use of one tail , . . 374 

4. Symmetry condition. .. 375 

5. Logarithmically shortest confidence intervals ... ... 375 

6 . Reciprocal limits , . . 376 

7. The likelihood ratio. . . , 377 

8 . Equal tails . . . . ., , . . 377 

9. Comparison of the tests and confidence intervals , , 378 

Part II. Significance teats and confidence intervals based on any similar regions 

10. Common best critical regions . . ... 382 

11. Type Bi region. . . ... . . 383 

12. Neyman’s categories of confidence intervals ... . . . 386 


1 . Introduction and summary. Suppose that we have two samples £?i and 
Ei from normal populations in and with unknown means and variances. 
Let us designate by 0 the ratio of the variance of in to that of r 2 The two 
problems discussed in this paper are to formulate m terms of Ei and , and to 
compare, 

( i ) significance teste for the hypothesis that the unknown ratio 9 is equal to a given 
positive number 6 t , and 

(ii) confidence intervals for 6. 

Since, on the one hand, these problems are of considerable importance to the 
practical statistician and the teacher of statistics, and on the other, they cry 
for the application of recently developed theory which is unfortunately not yet 
familiar to many practical workers and teachers, the development has been 
divided into two parts: Part I, it is hoped, will be intelligible to the above class 
of readers; part II, slanted toward a smaller circle, is more esoteric, general, and 
condensed. 

More specifically, in part I it is pointed out that any choice of limits on the 
^-distribution satisfying the condition that the sum of the areas in the tails 
be equal to a prescribed number, leads to solutions of problems (i) and (ii). 
After considering and then ruling out the “one-sided” situations in which it is 
appropriate to use only one tail, two conditions are proposed (ad hoc and on an 
intuitive basis) for the “two-sided” case,—a symmetry condition, and a condi¬ 
tion for logarithmically shortest confidence intervals. The second condition 
leads to a choice of limits on the ^-distribution. From other considerations, 

371 




372 


HENRY HCHKFKfi 


reciprocal limits, likelihood ratio, and equal tails, other ehoice.s are advanced. 
It is found that, all four of these choices satisfy the first condition, and that 
furthermore if Ni — A r -j, where A', is the number of variates in E, , then the 
four choices become identical. If A’i ^ A' 2 which of the four tests is “best”? 
which of the four sets of confidence intervals? For defining and answering the 
first question in a logically satisfactory way just a little of the Neyman-Pearson 
theory of testing hypotheses suffices. For the second, Noyman’s theory of 
confidence intervals is called for, and because of its greater difficulty, this has 
been relegated to part II. However, the limits determined by the criterion 
that the test he unbiased turn out to he the same us those which yield optimum 
confidence intervals from the elementary viewpoint, of §5. Their numerical 
values are unfortunately laborious to calculate accurately if *Vi y* X 2 , and part 
I concludes with some numerical evidence indicating the. loss of efficiency in 
using instead the easily found "equal tails” limits. For A r i and N t Si 10 this 
loss is seen to be quite small. It will perhaps hear repeating that if Xj = X,, 
the “equal tails” limits on the /'’-distribution are the same as those associated 
with the unbiased test and that hence in this ease all the advantages uncovered 
in parts I and II for the unbiased test and the related confidence intervals are 
obtained by using the easily available “equal tails” limits. 

In part II we drop the restriction that the tests be based on a one or two-tailed 
use of the. /’-distribution. By a slight extension of results of Neyman and 
Pearson, common best critical regions for testing the hypothesis 0 «* 0 O against 
alternatives 0 < 0 O , or 6 > 0o, arc found. Since the regions are always distinct 
for these two “one-sided” cases, there is no uniformly most powerful test. In 
order to find the most efficient unbiased test some recently published theorems 
of the writer are applied to prove that the critical region of the unbiased test 
proposed in part I is of type B\. 

The fact that the results summarised in the above paragraph are obtained 
for arbitrary positive 0 O will immediately suggest to the reader familiar with 
Neyman’s theory of confidence intervals that it may be easy on the basis of 
those results to draw conclusions about the existence of Neyman’e various cate¬ 
gories of confidence intervals. It is. In particular we find that the set of 
confidence intervals arrived at in §5 constitutes Neyman’s short unbiased set. 

The writer is aware that not all the results of this paper are now, and hopes 
he has given credit where it is due, hut believes it desirable to bring together all 
the results, old and new, in this attempt to clean up the problems (i) and (it). 
He is pleased to acknowledge his debt to Air. David Votaw for aiding in the 
calculations for fig, 1 and for finding the formulas (0). 

Part I. Significance Tests and Confidence Intervals Based on the 

/’-Distribution 

2, The /’-distribution The sample Ei : (z.i, a’ l2 , , *,«), i — 1, 2, is 

assumed to be from a normal population with mean a,' and variance a) . We 



RATIO OF VARIANCES 


373 


wnte 6 - (j?/ <j \, and might regard the statistic T as an estimate 1 of 6, where 
T = s\/s\ and 


= 2 (xh - x t f /«,, x t = £ x„/iV,, «, = AT. - 1. 

>-i 3 -i 

It will be convenient to consider 0, C 2 , as the population parameters, on 
being eliminated from the joint p.d.f. (probability density function) of Ei 
and E t by the substitution a\ = 9<j\ . For any given positive number 0 0 we 
define the composite hypothesis 

Ho : 6=9 o, 0 <o- 2 <+°°, — =o < a : < + =o, — oo < a 2 < -f- oo. 

In Hotelling’s apt terminology the last three parameters are nuisance parameters, 
It is well known that U\ and U %, where U, = , are independently 

distributed accordmg to x -laws with n.i and n 2 degrees of freedom respectively, 
and that hence the quotient F = (Ui/nf) 4- (U t /n*) = T)6 has the F-distribu- 
tion h„ int (F) dF with ri\ and n 2 degrees of freedom, where 


ffninj(n) 


(nt /rw) ini 


v 


,1*11—1 



m \ H( " 1+ns) , 

— u ) ’ 

n 2 / 


0 S uS co. 


For later reference we note that if We define the variable x from 


( 1 ) 




Ut z 
Til 1 — X ’ 


then the cumulative distribution function of x is the incomplete Beta function 1 
I»(ini , irk). 

Let a be any number such that 0 < a < 1 (a will be the significance level 
for (i); 1 — «, the confidence coefficient for («')). The symbols A„ ini , B„ in , 
will always denote a pair of numbers for which 8 


( 2 ) 



du — 1 


a. 


Every choice of the pair A , B leads to a solution of problems ( i) and (it): 

(t). A teat of Ho at significance level a consists of rejecting Ho if T < A„ in ,0 o or 
T > B„ ini 6 0 . 

The probability of rejecting Ho if it is true is 


1 - Pr(A9 0 g T £ B9o \ 6 0 ) = 1 - Pr(A < T/6 0 < B \ 6 a ) = a, 


independently of the true values of the nuisance parameters. 


1 Biased. 

1 All the results of this paper pertaining to the F-distribution could of course be stated 
in terms of Fisher’s 2 -diatribution [2] or the incomplete Beta distribution, the first is used 
here because of its popularity in applied statistics, and because it permits the simplest 
statements for solutions of problems (t) and (n). 

J Superscripts on A, B will signify that a further condition has been laid on the pair 
A, B The subscripts will be dropped when there is no danger of confusion. We permit 
B = « as a possible choice. 



374 


HUNKY KCHKKFf; 


(ii). A set of confidence into reals for 0 with confident'/ coefficient 1 — « is 4 

T/B^, £ 6 £ T/An^t. 

The probability that the true value* of Q will be covered by the above random 
interval is 

Pr[T/B £ 0 £ T/A | 8) m Pr(A £ T/8 £ Ii \ 6) ~ \ - a, 

whatever be the true values of 8 and the nuisance parameters. 

It will be convenient to adopt a brief notation for the tests and confidence 
intervals determined by eertain choices of the limits ,4, B. In the, sequel we 
shall denote these choices by A , where i — I, II, • • • , VI. We 

shall call the significance test based on the pair .4', B' the ted i, and the set of 
confidence intervals based on this pair, the set i of confidence intervals, or some¬ 
times more briefly, the confidence, intervals i. 


3 . Use of one tail. Suppose a situation in which we do not mind accepting 
Ih if the true value of 0 exceeds 0 a , but we desire a Lest which is as sensitive as 
possible in rejecting Ho when 0 < 8 0 . It can be shown (for » 2 > 2) that the 
expected value of T is &{T) ~ 7J*0/(«a — 2), and hence when the true value of 9 
is small compared with 6 0 , so is t'(T). By the usual intuitive eonside.rations we 
are led to rejecting // D if F » 7/0# falls in the left tail of the /‘-distribution. To 
make the significance level equal to a we take the limits A, li so that 





B 


i 


00 , 


Similarly, to test //# against alternatives 8 > 8 a we define test II by 

n, = 0, hn,n,(u) dll *= tt. 

* „ 

Why test I is best for testing Ho against alternatives 8 < 0#, and test II for 
6 > do , will be explained more convincingly in §9. 

The confidence intervals I and II are then semi-infinite. It is apparent that 
if we are not loath to accept large values of 6 but wish to exclude the largest 
possible interval of small values (0, T/B), we should use the set II. Indeed, 
the set II is optimum in the ease where we are willing to accept values of 6 larger 
than the true value but desire the highest possible probability of excluding any 
values less than the true value; however, the precise formulation and proof 
of this statement must be postponed to part II. Analogous remarks apply to 
the set I and a willingness to accept values of 8 less than the true value, 

For a — .05 or .01 the values of J9” are given in Bnedceor’s F-tables [12; 


4 If B *> “s we omit the equality sign to the left of 9, if /i = 0, the equality sign to the 
right of 9. 



RATIO OF VARIANCES 


375 


same ri \, n-> as ours], and the values of A\ lTli may be calculated from the same 
tables by using the relation 

(3) A\ l9 , ■= 1 /£?,„, . 

for a = 50, .25, .10, 025, .005 may be obtained by use of the transforma¬ 
tion (1) and Thompson’s new tables [13] of percentage points for the incomplete 
Beta distribution. -B” „ 2 for these values of a can then be found from (3). 

4. Symmetry condition. We now restrict our attention (until §9) to the 
“two-sided” situation in which we are interested in all alternatives to 0 = 0 O 
on the range 0 < 9 < ■» Let us contemplate the following symmetry condition: 

(^) -4*11*11 = f/^"2*>l 

for all positive integers ?ii, . The desirability of this condition and that of 

§5 follows not from mathematical principles but from practical considerations 
which might be relevant whenever significance tests or confidence intervals are 
considered for a parameter 0 which is the quotient of two other positive param¬ 
eters 9i and 0 2 , and the estimate of 9 is the quotient of the estimates of 6i and 0 2 . 

Suppose that given the samples E\ and E z , computer G labels them 1, 2, 
the same way we have, and using our test of §2, rejects the hypothesis that 
c -t\/a\ = k unless 

njk s S1/S2 

while computer (? 2 labels them 2,1, and following a similar rule rejects <r\/a\ = 1/fc 
(in our notation) unless 

A ni Jk S S 2 /S 1 g B nt Jk. 

It will be seen that (4) is merely the condition that they reach the same con¬ 
clusion. This makes life simpler, at least for computers and consulting statisti¬ 
cians Likewise, if <?i and (? 2 use the confidence intervals of §2, then they will 
make numerically equivalent statements about a\/a\ and <n/c\ if (4) is satisfied. 

6 . Logarithmically shortest confidence intervals. The length of the confi¬ 
dence intervals of §2 is L = T(A _1 - 5' 1 ). We might consider choosing A, B 
in such a way that €(L) is minimum. This leads to the problem of minimizing 
A -1 — B” 1 subject to (2). It might seem just as desirable, however, to minimize 
the expected length of the confidence interval for 0*, 

(! T/B )* ^ g ( T/A)\ 

This leads to a different problem with a different solution 

The condition on confidence intervals for d which appears intuitively'desirable 
to the writer, is that the limits 0, 0 of the confidence interval 6{E 1 , E t ) § 0 g 
g(Jg?i f Ei) be such that 6(log 0 - log 6) is minimum. For the confidence inter- 



37() HENRY HOHEEF6 

vain of §2 thia is equivalent to minimizing ft/A, and by using the method of 
Lagrange's multipliers we easily iind that 

(5) [Ufcn, „,(«)]£!-,, *= 0 

and (2) must be satisfied. Denote the solution* by .'ll/, 1 ,,,, B "^ n ,. It is evident 
that the same condition (5) is obtained if we ask for logarithmically shortest 
confidence intervals (based on the /■’-distribution) for 0* where k > (). 

The, numerical values of the limits ,4 m , P m are difficult to calculate if ?n =*s n 2 . 
The beat procedure seems to be to transform to the. incomplete Beta distribution 
by means of (1) and to calculate the corresponding points a” 1 ,,,, from the 
equations 

(0) , 4iW)]t-« =*= (/»($»■i + 1, = 1 — «. 

The points o, b can be found to two decimals by inspection of Pearson’s tables 
[9], Unfortunately, in the many cases where a is close to 0, or b to 1, /t rn , B ni 
are then Rubject to enormous error when calculated from (1). 

6 , Reciprocal limits. While the problems ( i) and (it) are closely related, the 
last choice of limits was suggested solely by our consideration of (it), Later 
we will reconsider this choice from the standpoint of (si),—the reader may 
anticipate that it will again be found advantageous in some respect. For the 
present, we proceed to three further choices, these, arising from various ap¬ 
proaches to (t). 

The procedure recommended in several statistics manuals (see §8) for testing 
the hypothesis 6 1 is to refer the quotient of the larger of s’, s’ by the smaller 

to tables. This suggests the introduction of a statistic U defined as the maxi¬ 
mum of T, T~ l . Its distribution 8 under the hypothesis 0 =* 1 is cosily found: 
Let ffn tnj(Af) be its p.d.f. Then for I ^ u £ », 

!7fh«,(u) du — Pr(u < M < U + du | 8 =# 1) 

= Pr(u <T<u + duoru< T~ l < u + du) 

— Pr(u < T < u + du) -f- Pr(u < T~ l < u + du), 

since the last two terms are the probabilities of mutually exclusive events. 
Furthermore, the first term is h ni „,(u) du, and because of the symmetry induced 
by = 1 we can evaluate the second term by merely interchanging subscripts. 
Hence the desired distribution is 

0ni«j(ll) ^ hn 3 n,(u), 

regardless of the true values of the nuisance parameters, 


s It can be shown by elementary methods tfiat the solution of these equations exists and 
Is unique; likewise for the solutions later denoted by superscripts IV and V. 

* Considered by K. Pearson [8], 



RATIO OF VARIANCES 


377 


If we reject the hypothesis 0 = 1 if M > i¥, in ,, where 

g ni n t (u) du = a, 

l l n 2 

then this significance test is easily shown to be the same as that of §2 with 
0 O = 1 and 

d ,1,7,2 “ -S,,,,,. i 

We remark that again these limits are not easy to compute if ^ « 2 . 
While this choice of A, B, which we shall call d™,,,, B^„ 2 , has been motivated 
only for the case 0o = 1, it leads of course to a test IV for any 0 O and a set IV 
of confidence intervals. 



7. The likelihood ratio. Since the properties of X-cnteiia in general have 
received much attention in the literature, and since in particular the X-test for 
Ho is equivalent to a certain choice of A, B, we shall mention it here, and see 
whether it has any advantages in §9. X for H 0 in the case 0 O = 1 was given by 
Pearson and Neyman [7; their Hi , ru , s\ , 0, \ Hl are our H 0 ,N t , s](N t - 1 )/N ,, 
Ni(N 2 - l)/{V 2 (iVi — 1)T},X]; for any 00 it may be shown to be 

X = C„ ins F 3/2 (l + %/) 

On considering the (bell-shaped) graph of X against F we see that X < Xo cor¬ 
responds to two intervals, say 0 ^ F < F' and F" < V S m, The X-test, 
which consists of rejecting Ho when X < Xo, where Xo is determined so that the 
significance level is a, is thus equivalent to test V with d,,,,^ , Bn L „ 2 satisfying 
(2) and 

|/ /2 ( 1 + S W ) K ^ {U) \-a " °‘ 

8. Equal tails. Perhaps the most venerable procedure for determining limits 
on a distribution for a significance test in a “two-sided" case is to choose them 
so that the tails of the distribution have equal areas. Define from 

J r^ VI n f 60 

' hntnzM du = h ni n 2 ( U) du = \a 

The values of for a = .10 and .02 are given in the P-tables [12; same 
n, , n 2 as ours] as 5% and 1% points. The relation 

A V1 R YI = 1 

f7) Slrnni- 0 nzni - 1 - 

is easy to get, and hence d^ ni for these values of a may also be calculated from 
the I^-tables. The limits for = .25, .10, .025, .005 can be calculated from 
(1), (7), and Thompson’s tables [13] 



378 


HB1NR.Y KCHEFP/: 


Since test VI will later be seen to have some merit we will discuss it somewhat 
further at this point. In several statistics texts [o.g., 3, 14] the -student is told 
to take the quotient of the larger by the smaller of 4 , sj , refer it to the /"-table, 
taking the n i of the table to be the n, of the numerator, and to reject the null 
hypothesis 0 = 1 if the wimple value is larger than tin* tabulated. It is then 
further stated without proof that in using the 5% or 1% points of the /'-table, 
the significance level is aetually 10% or 2%. Since the quotient thus referred 
to the, table is precisely the statistic M of §(>, it would seem logical to refer it 
to an Af-table rather than the /’-table! However, the above procedure can lie 
justified 7 as follows: The equation (7) tells us that test VI fulfills the symmetry 
condition (4). It makes no difference then in his conclusions whether the 
computer uses the statistic sf/sa and the distribution /t„, „,(/’) or s\/s\ and 
(/’). In particular he may always use the larger ratio and h mn (F ), where 
m and n are the “degrees of freedom” of numerator and denominator, respec¬ 
tively. Since this statistic cannot, fall in the lower tail, he need consider only 
whether the. calculated value exceeds the tabulated. But in using the value 
tabulated as the upper p% point of the E-distribution, he makes his test at the 2 p% 
significance level, 

9 . Comparison of the tests and confidence intervals. We now have at hand 
two one-tailed and four two-tailed tests, and corresponding sets of confidence 
intervals, all based on the. /’-distribution. We note at this point that all four 
of the two-tailed tests satisfy the symmetry condition (4), and that in the special 
case n* = Ttj „ these four tests become identical. In comparing any two tests, 
an instrument which makes their relative advantages completely anschaulich 
is the power curve (surface in a more complicated case). The definition and 
interpretation of the power curve of a teat are based on the insight of Neyman 
and Pe,arson [5] that two types of error are possible in applying a test: We 
may (I) reject the hypothesis when it is true, or (II) accept it when it is false. 

We see immediately that for any test of the class considered in §2, the prob¬ 
ability of a type I error is the same, namely a, To find the probability of a 
type II error, let us introduce a little more terminology: We denote by E the 
sample point ( E x , Bf) and by w the region of sample space defined by 

(8) T < Ada and T > B6 0 . 

w is called the critical region of the test: the test rejects Ha if and only if E falls 
in w. The probability of this, which is called the power of the tost, is 

1 - Pr(Ada/6 £ T/8 g B0a/8 \8, el, a x , of). 

Since in the present case this happens to be completely independent of the true 
values of the nuisance parameters, even for 0 ^ 8o , let us write it as P(iv | 0). 
Then 

1 The writer is indebted to Mr T. W. AnderBon, Jr. for pointing out to him that it is not 
necessary to use the M -distribution. 



RATIO OF VARIANCES 


379 


«B5 0 /9 

(9) P(w | 9) = 1 — / ' h nini (u ) du. 

JaDq/s 

Finally, by the power curve of the test we mean simply the graph of the power 
P(w | 8) as a function of 6 

We may now state the probability of a type II error: it is 1 — P(w ( 8), where 
necessarily 8 ^ 9 0 Hence the ordinate on the power curve for 8 ^ do is the 
probability of avoiding a type II error, while for 0 = 0o it is the probability of 
making a type I error. By inspection of equation (9) we find that, barring the 
cases B = « or A = 0 (tests I and II), P(w | 8) —* 1 as 8 —* 0 or ». We cal¬ 
culate the derivative to be 

(10) P'(w | 8) = [uh nini (u)/8a 0/ e , 


PMe) 



Fig 1 


which is obviously continuous for 0 < 0 < °o. If we equate this to zero we find 
a unique solution for 8, and hence the power curve has a single minimum point. 
In the exceptional case B = » we see from (9) that P(w | 0) decreases mono- 
tonically from 1 to 0 as 8 increases from 0 to » ; in the case A = 0 , P(w \ 9) 
increases monotonically from 0 to 1. Some power curves 8 are plotted in fig. 1. 

Always understanding by w a region of the set defined by (8), and recalling 
the above interpretation of the ordinate on the power curve, we are led to ask 
whether there is not a w, say w a , whose power curve nowhere drops below any 
other curve P = P(w | 9). (They all pass through (0 O , a).) The test based 
on such a region w 0 would be called uniformly most powerful (UMP) of the class 
considered, and obviously would be preferred under any circumstances. Alas, 

s Power curves for test V may be found in a paper by Brown [1]. It did not seem worth¬ 
while to construct curves for test IV, since the limits are hard to compute, the test is biased, 
and has little historical interest 



380 


HENRY SCHEFrfl 


it does not exist Perhaps some insight into the fact of the general non-existence 
of UMP tests can be gained by returning to fig. 1. While fig. 1 is for the case 
ni — 10, ns => 20, and a. — .05, the following remarks are valid for any iq, n 2 , a: 
We note that for testing Ila against alternatives 0 < 6a test I is far superior to the 
other three, indeed, it is superior to any of the tests of the class defined by (8) 
in the sense that its power curve lies above that of any of the other tests. 3 But 
for alternatives 8 > 8 0 , test I is seen to be very poor (the worst possible, it can 
be shown). Similar remarks apply to test II and the complementary alterna¬ 
tives. This constitutes the more convincing explanation promised m §2 of the 
superiority of teste I and II in the ‘‘one-sided' 1 eases. Since the power curve 
of test I lies above all other power curves for 0 < 8 0 , and that of test II above 
all for 0 > 6q , it is now clear that there is no UMP test of the class considered. 

To cope with the commonly occurring situation where there is no UMP test, 
Neyman and Pearson [5] defined an unbiased test,—-one whose power curve has 
an absolute minimum at So > The desirability of an unbiased test in the “two- 
sided” case is evident when we note that if a test is biased, the probability that 
we accept the hypothesis 6 *» da is greater if 6 has certain values 8 0 a than if 
8 * 0q . To find which, if any, of our teste is unbiased, we equate expression 
(10) to zero for 8 » 8 0 . Asa result we find 10 the condition (5) which determines 
test III. 

We see now that the limits A ia , B m yield the preferred test in the “two-sided” 
case, as well as the logarithmically shortest confidence intervals. However, as 
pointed out in §5, the numerical values of these limits arc difficult to calculate, 
and the question then arises, do we lose much by using instead the easily ob¬ 
tained "equal tails” limits A VI , ■B' n ? In the case n t = 10, ih = 20, a — .05, 
fig. 1 shows that the power curves of tests III and VI differ very little. The 
extent of the bias of test VI for other values of tu, , and a — .05, .01 is in¬ 

dicated in table I. (The missing diagonal entries are all 1,5 or 1,1). Let 
us call the entries 0, 100 a, where 8 * 0min/0o , « = P(w yl \ 8 m i„). From (10) 
and (1) we get the following formula for computing 

(3 - (ffi-«Q" ,/< "‘ + "’ > )/(Q - 1), 

where 

Q = 21/(3, (3 = o/(l — a), fB = 5/(1 — b), 

and a and 1 — b are the 100(£a)% points on the incomplete Beta distribution 
for y 3 = «i, n = n a , and m * n x , v a = , respectively, in the notation of 

Thompson’s tables [13]. a may then be computed by transforming (9), 

r -i(l + p/ar 1 

a * 1 - /»(i»x,in*) , 

_ L - d + p/q r 

* The reader may prove this from (9) or note that it is a special case of the 
results of $10. 

15 The equivalent condition on the incomplete Beta distribution was given by Pitman 
110 ] for the case 8q ■ 1. 



TABLE I 

Minimum points of power curves of lest VI 
The entries are 0 m in/0o , 100 P (to 71 1 0 m in), 


Roman type for a = 05, bold face for a = .01 


_ 

1 

2 

3 

5 

10 

20 

40 

co 



634, 

576, 

.559, 

mm 

man 

581, 

588, 



4.75 

4 47 

4.17 

HI 

Sill 

3 68 

3.61 

1 


.631, 

.577, 

.671, 

.696, 

.617, 

.630, 

.645, 



.946 

.883 

.808 

.740 

.705 

.687 

.670 


1.578, 


861, 

.779, 

.745, 

737, 




4 75 


4.93 

4.69 

4.44 

4 26 



2 










1.585, 


.866, 

.776, 

.749, 

.749, 




.946 


.982 

.928 

.863 

.804 

BhI 




1 161, 


.895, 

.838, 

.819, 

.812, 

.808, 



4.93 


4 92 

4.70 

4 51 

4.41 

4.29 

3 

1.734, 

1.170, 


.889, 

SOI 

.821, 

.819, 

.820, 


.883 

.982 


.978 

.917 

.B67 

.837 

.804 


1.789, 

1 284, 

1.117, 


927, 

.898, 

886 , 

00 



4 69 

4 92 


4.92 

4.78 

4.67 

4.64 

5 

1.752, 

1.289, 

1.124, 


.924, 

.896, 

.887, 

.882, 


.808 

.928 

.978 




.903 

.864 


1.771. 

1.342, 

1.194, 

1.079, 

u| 

I 

.949, 

.941, 



4 44 


4.92 


gp.clll 

4.89 

4.76 

10 






1 ' 




1.682, 

1.335, 

1.198, 

1.083, 



.949, 

.937, 


.740 

.863 

.917 

.975 

HH 

■ 

.964 

.925 


1.742, 

1.357, 

1 .221, 

1 114, 

1.036, 


.983, 

.967, 


3.75 

4.26 

4 51 

4.78 

4.96 


4.98 

4.88 

20 










1.622, 

1.335, 

1.217, 

1.116, 

1.038, 


.963, 

.968, 


.706 

.804 

.867 

imM 

.987 


.993 

.960 


1.722, 

1.360, 

1 231, 

1.129, 

1 063, 

1.017, 


.984, 


.3.68 

4.15 

4.41 

4.67 

4.89 

4.98 


4.94 

40 

1.687, 

1.327, 

1 .221, 

1.127, 

Iff 

1.018, 


• Vvl) 


.687 

.778 

.837 

IhSI 

IKaal 

.993 


.980 


1 700, 

1.360, 

1.238, 

1.140, 

im 

um 

um 



3.61 

4 05 

4.29 

4.54 

4.76 

4.88 

4.94 


0O 

1.549, 

1.315, 

1.219, 

1.134, 

1.067, 

um 

1.017, 



.670 

.761 

.804 

.864 

.925 

| .960 

.980 



381 















































m 


HENRY SCHEFrf 


and using Pearson’s tables [9], or, when x is very dose to 0 or 1, using a few 
terms of the series 


Ifiim, ^rt) = 1 — lm) 




"2 _ 
_m 


n — 2_ 5 

2°(m+T) ll 


«*" 

B(£m, in) 

(n - 2)(» - 4) a 2 _ (n - 2)(n - t)(n - (5) <$ 3 
2»(m + 4) 21 2'(m + 6) 3! 


’f¬ 



lu computing d, a it is perhaps simplest to take n t > nj and use the relationships 


drti«a — Otn 2 n, . 

When sample sizes «i + 1, nj + 1 are such that table I indicates a large bias’ 
it might be worthwhile to get limits for an unbiased test from, the “equal tails’, 
limits as follows: The limits A m , B w for an unbiased test III may be obtained 
by taking 

I 111 = A VI /0, B m = B v V0, 

but the test mil then he at significance level a. The gain in using A m , E m instead 
of A VI , B V1 is more apparent when we consider confidence intervals: The sets 
associated with A m , E' n , and A VI , B vl have the. same logarithmic lengths, but 
the confidence coefficients are 1 — a and 1 — a, respectively. 

This seems to be, about as far as it is worthwhile to carry the. developments 
at the elementary level of part I. Some inadequacies may already have disturbed 
the reader: Why not consider in place of the interval (A, B) on the range of F 
any measurable region 11 It such that the integral of A„, (F) over R is 1 — a? 
Under the transformation T » QoF the complement of R, just as the complement 
of (A, B), would lead to critical regions u> for which P(w | ft) * a for all values 
of the nuisance, parameters. Critical regions satisfying the last condition are 
said to be similar to the sample space with regard to the nuisance parameters. 
More generally, how would our preferred test I, II, III stand up if we admit 
for comparison, tests baaed on any similar regions whatever? Finally, how 
can one formulate in a general way conditions for optimum confidence intervals, 
and would a more general formulation still lead to the preference of the sets 
I, II, III? Answers to these questions will be found in part II. 


Part II. Significance Tests and Confidence Intervals Based on any 

Similar Regions 

10. Common best critical regions. For the case 0 O *» 1, Neyman and Pearson 
[(5] have shown that the critical region of test I is the common best critical 
(CBC) region for testing // 0 against alternatives B < 0 O . This result is easily 
extended to any 0 O by a simple device. We consider the following 1:1 trans¬ 
formations of variables and parameters: 


u Oar intuitions may balk at the notion o t using seta R mote general than intervals, but 
it would nevertheless be reassuring to find that our tests can meet this competition. 



BATIO OF VARIANCES 


383 


(11) x M = elx'n , x 2k = xu , j = 1, 2, ,Nt;k = 1, 2, ••• ,Nt, 

(12) 6 = dad', crl = (cr 2 ) 2 , ai = dla'i , ai = a [. 

Denote by E [, E’ 2 , E' the points corresponding to Ei, E 2 , E, respectively, 
under the transformation (11), by ft any point in the space of the three nuisance 
parameters, and by ft' its correspondent under the transformation (12), by 
Ho the transformed hypothesis, H$. 6’ = 1, ft\ unspecified. If w is any Borel- 
measurable region of the space of E, and w’ the map of w under (11), then 
Pr(E ew \ 6, ft) = Pr{E't w' | 8', ft'), which we shall write as 

(13) P(w | 6, ft) = P(w' | 8', ft'). 

We note that the coordinates of E[ are normally distributed with mean a[ 
and variance (a[f where (tr() 2 = 8'(cr' 2 ) 2 , all N x + N 2 coordinates being 
statistically independent. Designating the critical region of test I by w 0 , 
and its map under (11) by w a , the result of Neyman and Pearson may then be 
stated as follows. w 0 is a CBC region for H' 0 and alternatives 6' < 1. Now 
suppose Wo were not a CBC region for Ho and alternatives 9 < 8 0 . Then there 
would exist a region w x , a value 8 X < 6a , and a point ft x such that P( w x | 0, , tlx) > 
P(w o | 6i , ill), while P(w x \ do , ft) = a for all ft. Let w[ , o[ , til correspond to 
Wi , 8i, ili under (11) and (12). Then from (13) we would have that 
P{w[ | 8 '\, ft[) > P(w o | d ' x , tl(), where d[ < 1, while P(w[ | 1, ft') = a for all ft'. 
But this would contradict the fact that w' 0 is a CBC region for Ho and alternatives 
6 ' < 1 . 

The proof that the critical region of test II is a CBC region for testing Ho 
against alternatives 9 > 6 0 is of course completely analogous. This establishes 
the non-existence of a UMP test for H 0 , and so we consider next the existence 
of a “best” unbiased test. 

11. Type Bi region. This section is a direct application of a recent paper 
“On the theory of testing composite hypotheses with one constraint” to which we 
shall refer as [11] Since it is not feasible to restate here the definitions, assump¬ 
tions, and theorems of [11], we shall refer to them by their numbers there It is 
convenient to transform the parameters of the p.d.f. of E by putting 

(14) 8 = l/ty, 6 0 — l/l^o , CT2 = 1/h. 

Then 

(15) p(E | h, oi, a 2 ) = (27r)- i V' Vl ^ iW * 

exp {— %\ph[Ni(x x — a-i) 2 + &] + h[Ni(x 2 — chf + $ 2 ]], 

where 

N = N x + N t , S t = nd. 

We note that type B and type B x regions (definitions 1, 2 m [11]) are invariant 
under certain transformations of parameters: Suppose new parameters 0', ft' 



m 


HENRY SCHEFFfi 


are introduced by 1:1 transformations 0 — 6(0'), t? = d(0'). Let d' a correspond 
to 8 «, and consider the transformed hypothesis H o : O' — 8q ; t?', unspecified. 
Sufficient, conditions that a region be of type B for testing lh if it is of type B 
for testing 11 o are that, the function 0(0') have first and second derivatives and 
that the first not vanish at d' e . The last statement remains true if B is replaced 
by Hi . Since the transformations (14) satisfy these sufficient conditions, we 
define 

//« : ^ “ <Po i d' * (A i Ol , Oi), unspecified, 

and propose to show that there exists a ty[H> Bi region for testing H ' a , and that 
it is the critical region of test III. 

For later reference we now note that the four functions of variables and 
parameters defined in Table II are mutually independently distributed as 
indicated there. 


TABLE II 


Function 

Distribution 

U 1 "» if'hS 1 *» 

Ui m hS, m St/ <r, 

vt >* (AA'i)'(fi — ai) *» A'}(*i — ai)M 

U, - (kNi)HXi - Oj) - N\ixt — Oj) /<rj 

x 1 , with m degrees of freedom 

ft «« ^ U fl U 

normal, with zero mean and unit variance 

ri if it (f (f (i rc 


Let us first verify the critical assumption 3° of [11]: Identifying our ip, h, Oi , aj 
with Ox , 0 %, Ot , of [11], we find from (15) that 

<h - m/t - Wito - oi)' + Si]), 

<h - iiN/h - +(Nifr ~ ai) s + Si ] - [AT, (ft - a,) 2 + £,]), 

<t>3 “ phNi(£i — oO, 


4>i ~ hN s(£j — a«), 

and then check 3° by differentiating equations (16). 

To verify assumption 4°, let *i, , x } , z t of [11] be our am , x n , am, am , 

respectively. We calculate 


d(<t >i j f*) 
d(xi , xt , Xi , a*) 


4 >h\xi - x 3 )(xi - x t ), 


which vanishes only on the same set of probability zero for all admissible values 
of the parameters, The validity of assumption 5° follows from §5 of [11], and 
there is no difficulty in verifying 1° and 2°, 

To apply theorem 1 of [11] we must find functions ft<(«fc , &, <pi ; \po, &'), 
i = 1,2, such that 

(17) / <t>l p(<t>i , <j>i , <f>3 , <pi | fa, &') d<tn = (1 ~ a) same, 

**1 J-QO 



RATIO OF VARIANCES 


385 


for t — 0,1, where the symbols <£, henceforth are understood to stand for the func¬ 
tions (16) with '<p replaced by ipa . If the functions fc, exist, then the region in 
sample space defined by 

(18) <t>i < ki and fa > fc 2 

is independent of and of type B 

From equations (16) and Table II we see that 

fa = i(Ni ~ Ui)/fa, fa = |(i\T — uf)/h, 

fa = {fahNifus , fa = (JiN^Ui , 

where 

Wi = Ui + ut, uz = Ui + Ui + ul + u*, 

and \p is put equal to fa in Ui, u 3 . Furthermore, for fixed Mj, u 3 , u 4 , the range 
of Ui is 

u\ £ Ui ^ u s — ul . 


Transforming the integrals in (17) by substituting (19) and 

, , , 1 , an p(E7i, Ui, Ms, u t \ fa , tV) 

!»(*, ,«>.*,* I*, « ) = ■ 

6(ui, Ua, Us, U 4 ) d(Ui, Ui , Us, u<) 
where the p.d.f. in the numerator is, from Table II, 

CUi' l ~ 1 U 2 >i ~ 1 exp (-Ju»), 
we get as the equivalent of (17) 

f (iVi — ui)‘(ui — U 3 ) ini-1 (u 2 — ul — ui) ! " 8-1 = (1 — a) f same 

Jjt. •'0 

if;(u2, u 3 , Ui ; fa , «?') = fci(<fc , fa, fa] fa, #')• 


with 


Finally, we let 

(20) x = (ui — us)/(u 2 — us — ul), 


and get , 

f 1 [JVx - «s - (u 2 - Us - u5)x]'x 4 " ,-1 (l - a:)'" 2-1 dx = (1 - a) f same, 

where k,(u 2 , u 3 , u A ; fa , t?') are the values of x obtained by setting u x equal to 
the function if, in (20). The last condition is equivalent to 

(21) f x ini ~ 1+l (l - x) ln,-1 dx = (1 - a) f same, t = 0, 1, 



380 


HENRY feCHEFFfi 


Since x is a continuous monotonic function of 4n , (18) becomes 

(22) x < ki and x > kj . 

Solutions for the functions ki , ki satisfying (21) exist in the form *, = constant. 
Indeed, if we now note that the x defined by (20) is the same as that defined in 
(1), and let — a, k 2 = />, we see that the conditions (21) are. identical with 
(0), and that our method of finding type B regions has led us to the critical 
region of test III. 

To show that the type B region obtained from Theorem 1 of [11] is also of 
type Bi , we appeal to Theorem 2: From (15) we have 

p(B | &)/ V (E | h , d') - exp {(tf - vMfai ~ WM |. 

Since for ^ 'I'o tins function is convex- in 4>i, Theorem 2 is applicable. The 
result of this section is the conclusion that the critical region of test III is of 
type Bi for testing 17 0 , 

12. Neyman’s categories of confidence intervals. The concepts and ter¬ 
minology of this section arc those formulated in a basic paper [4] by Neyman. 
Suppose a distribution depends on a parameter 0, and on further parameters 
02 , <9 a , ■ ■ • , 0/ which we, shall symbolize by d. The hypothesis 

77(0o): 0 = 0 O ; d, unspecified, 

may be called a composite hypothesis with one constraint [11]. Let E be the 
sample point, IF be the sample space, and w he any Borel-measurable region in W. 
Writ ePr[E tw \ 9, d) = P[w | 0, t? }. The condition that a critical region u>(0 o ) 
for testing 7/(0o) be similar to W with respect to d is 

(23) 7 , (iu(0o) | 0o, d) = a for all d, 

where a is fixed throughout our discussion Suppose for every admissible 0p 
there exists a similar region w(0 0 ). The complementary region A(6 a ) = 
W — ui(0 o ) we may call a region of acceptance. For any E we next define the 
linear set 5(E) of points on the 0-axis as the totality of points 0 such that E e A(d). 
The probability [4] that the random set 5(E) cover a value 9" if the true value 
of 0 is 0' is 

(24) Pr{0" t6(E) | 0', d) = 1 - P[w(6") | 0', d), 
and hence from (23), 

(25) Pr[9' e 5(E) | 0', d) = 1 - a 

for all 0', d, and we might call the aggregate (5(E) I a set of confidence regions 
with confidence coefficient 1 — a. Now if all 5(E) are intervals, then they form 
a set of confidence intervals. 

We have now shown that if II(6a) is a composite hypothesis with one con¬ 
straint, if for every admissible 0o there exists a similar region w(6 a ) for testing 



RATIO OF VARIANCES 


387 


F(0o), and if the aggregate {6(E)} determined by the family {»(«,)} consists of 
intervals 8(E), then {5(E)} is a set of confidence intervals. By similar use of 
(24) the reader may prove that if furthermore each w(6 0 ) of the family has the 
property P of the table below, then the corresponding set {5(E) j of confidence 
intervals is of Neyman’s category C. 


P property of uf8 a ) 

C: category of ({(£)) 

gives UMP test 

shortest 

CBC for 9 > do (or 6 < 6 0 ) 

best one-sided 

gives unbiased test 

unbiased 

of type B 

short unbiased 

of type Ei 

shortest unbiased 


We have taken the liberty of calling a set of one-sided confidence intervals 
5(E). 6(E) g 9 (or 9 g 5(E)), 

where 6(E) and 6(E) are Neyman’s umquc lower and upper estimates, respec¬ 
tively, best one-sided, and of calling a set (5o(E)} shortest unbiased if for all 6', d 
it satisfies (25) and 

(26) [6Pr{0' e 5„(E) | 9, d)/d9] M ' = 0, 

while for any other set {5 X (E)} satisfying (25) and (26), and all 9", 6', d, 

Pr ( 6" 6 5 0 (E) [ 9', S Pr(6" e 5i(E) | 9', 0). 

It follows immediately from this discussion that our sets II and I of con¬ 
fidence intervals are the best one-sided, and that the set III is not only a short, 
but the shortest, unbiased set. 

In conclusion, we remark that Neyman's concept of the “shortness” of a set 
of confidence intervals strikes one at first as indirect,—to fully appreciate its 
elegance it is perhaps necessary to attempt the formulation of a general theory 
from a more naive approach,—and that it is then of interest to discover that 
in the present case his short unbiased set coincides with that reached by the 
direct intuitive (but obviously extremely limited) method of §5. 

REFERENCES 

[1] G. W Brown, “On the power of the Li test for equality of several variances,” Annals 

of Math. Slat., Vol. 10 (1939), p 127 

[2] R, A. Fisher, “On a distribution yielding the error function of several well known 

statistics,” Proc. Ini. Math,. Congress, Toronto, 1984, Vol. 2, p. 808. 

[3] J F Kenney, Mathematics of Statistics, part 2, N. Y., 1939, p, 144. 

[4] J. Neyman, “Outline of a theory of statistical estimation based on the classical theory 

of probability,” Phil Trans Roy Soc London, sei. A, Yol. 236 (1937), 
pp 333-380 

[5] J. Neyman and E S. Pearson, “Contributions to the theory of testing statistical 

hypotheses, part I,” Stal Res. Mem., Yol. 1 (1936), pp. 1-37. 



388 


HENRY SGHEPF^ 


[ 6 ] J. Neyman and E. S. Pearson, "On the problem of the moBt efficient tests at statistical 

hypotheses,” Phil. Trans. Roy Soc. London, ser A, Vol. 231 (1933), pp. 289-337. 

[7] E. 8 . Pearson and J. Neyman, "On the problem of two samples,” Bull. Int. Acad. 

Polon. Sc. Let ., ser. A, 1930, p 82. 

[8] K. Pearson, S. A. Stotjffer, and F. N. David, “Further applications in statistics of 

the T„(x) Bessel function,” Biomelrika , Vol. 2-1 (1032), pp. 306, 339, 340. 

19] K. Pearson (Editor), Tables of the Incomplete Bela Function, Cambridge, 193-1. 

[ 10 ] E. J. G. Pitman, “Tests of hypotheses concerning location and scale parameters,” 
Biomelrika, Vol. 31 (1939), p. 207. 

Ill] H. ScBEFFfi, “On the theory of testing composite hypotheses with one constraint,” 
Annals of Math , Stat., Vol. 13 (1942), pp. 280-293. 

[12] G. W, Snedecor, Statistical Methods, Ames, 1940, pp. 184-187. 

[13] C. M. Thompson, “Tables of percentage points of the incomplete Beta function,’! 

Biomelrika, Vol. 32 (1941), pp. 168-181. 

‘14] L. H. C. Tippett, The Methods of Statistics, London, 1937, p. 118. 



SETTING OF TOLERANCE LIMITS WHEN THE SAMPLE IS LARGE 

By Abraham Wald 
Columbia University 

1. Introduction. Let f(x 1 , • * ■ , x t , 6 X , • • • , 0 k ) be the joint probability 
density function of the variates x x , • • • , x p involving k unknown parameters 
81 , ■ ■ ■ , 6 k A sample of size n is drawn from this population. Denote by 
%ta(i — 1) ‘ ■ * , p, o. = 1, ■ • ■ , n) the a-th observation on . We will deal here 
with the following two problems of setting tolerance limits, which are of im¬ 
portance in the mass production of a product: 

Problem 1. For any two positive numbers < 1 and y < 1 we have to con¬ 
struct p pairs of functions of the observations L,{x n , • • ■ , x Pn ) and 
U t {x u , • • , x pn ) (i = 1, ■ ■ ■ , p) such that 



r Ul 


{/„ ■■ 

‘I /(* i, •• 

J L i 

■ 1 Zp) 8l, - ■ •, 6k) dx x ■ ■ ■ dxp > y I 01 


where for any relation R, P(R | 6 X , ■■• , 6 k ) denotes the probability that R holds, 
calculated under the assumption that 6 X , •■■,6k are the true values of the parameters. 

Problem 2. For any positive numbers (3 < 1, X < 1 and for any positive integer 
N we have to construct p pairs of functions of the observations Lfxn , ■ • • , x pn ) and 
Ui(x ii , • • • , x pn ) with the following property: Let y, a (i = 1, ■ • • , p; « = 
1, • ■■, N) be the a-th observation on the variate x, in a second sample of size N 
drawn from the same population as the first sample has been drawn. Denote by M 
the number of different values of a for which the p inequalities 

L,(x U , • ■ • , Xpn ) ^ y,a ^ U,{x ii , • • * , Xp n ) (l ~ 1, ’ * ' , p), 
are fulfilled. Then 

(2) P(M > XZV | 9 X , • • • , 6 k ) = p, 

where 61 , • ■ • , 6 k denote the unknown parameter values of the population from 
which the observations x, a and y, a have been drawn. 

The functions L , and U, are called the tolerance limits for the variate x,. 
We will say that L, is the lower, and U, the upper tolerance limit of x ,. In 
general, there exist infinitely many tolerance limits Li and U , which are solu¬ 
tions of Problem 1 or Problem 2. It is clear that the toleiance limits L, and 
f/, are the more favorable the smaller the difference C7, — Li. Hence if there 
exist several solutions for the tolerance limits L, and U t we should select that 
one for which the difference U x — L, becomes a minimum in some sense. 

S. S. Wilks 1 gave e. solution of Problems 1 and 2 in the univariate case, i e. 

1 S. S. Wilks, “Determination of sample sizes for setting tolerance limits,” Annals of 
Math. Stat , Vol. 12 (1941) . See also his paper on the same subject presented at the meeting 
of the Institute of Mathematical Statistics in Poughkeepsie, September, 1942. 

389 



ABRAHAM 'WALD 


300 

if p ~ 1. It hcoins that Wilks’ solution is the host possible one if nothing is 
known about the probability density function except that it is continuous. 
However, if it is known a prion that the unknown density function is an ele¬ 
ment of a ^-parameter family of funetions, il will in general be possible to derive 
tolerance limits which are considerably better than those proposed by Wilks. 

Wilks’ results ran easily be extended to the multivariate ease, provided the 
variates xi, • ■ • , x p are known to lx- independently distributed 2 This is a 
serious restriction, since in many practical cases the independence of the variates 
x-i, • •• , x p cannot he assumed. The case of dependent variates has not been 
treated by Wilks. 

In this paper we give a solution of problems I and 2 when the size n of the 
sample is large. In the next section a lemma is proved which will be used m 
the derivation of tolerance limits. In section 3 the univariate case is treated 
and in section t the results are extended to the multivariate case. 


2. A lemma. We will prove the following 

Lemm\. Let {.ri„|, ■ ■ ■ , {r rn | (n = 1,2, • • • , ad inf) be r sequences of random 
variables and Id a\ > • • ■ , a r be r constants such that the joint distribution of 
•\/n(ia ~ m), • ■ ■ , v/ ii(x T „ — a,) converges with n —* towards the r-variatc 
normal distribution with zero means and finite non-singular covariance matrix 
II II (b j - r). Furthermore, Id g(ui, * * • , u r ) be a fvnrlion of r 

variables Ui , • • • , u r which admits continuous first derivatives in lhe neighborhood 
of the pomt tti =» ai ,■■*,«, = a r . Assume that at least one of the first partial 
derivatives of g(ui, ■ • • , u r ) is not zero at the point Ui ~ at, ■ ■ • , u r = a r , Then 
the distribution of VttfffCbnt , • • • , x rn ) — g(ai , • • • , a r )] converges with n —> co 
towards the normal distribution with zero mean and variance <r 2 = EE C.J !/.(/; 

I • 

where Q; denotes the partial derivative of g(ui , - • • , u r ) with respect to u, taken at 

M\ ~ Oi j • * ’ ) "Ur = d r . 

Proof: Since the joint distribution of y/ri{x ln — af), , \/ n(x rn — a r ) 
approaches an r-variate normal distribution with zero means and finite non¬ 
singular covariance matrix, the probability that 


(3) 


a, - 


Vn 


< £>n < Oi + 


1 


(t * 1, • ■ ■ , r) 


holds, converges to 1 with n —> m , From (3) and the continuity of the first 
derivatives of g(ui , • • ■ , u r ) if follows easily that for any positive e the prob¬ 
ability that 


(4) 


E Vn (X{„ - a,)g { - e 


:< Vn , • ■ •, x rn ) - g(ai, • • •, a r )] < E (s,„ - &>)g< +'« 

1 


3 This was mentioned, by Wilks in his paper presented at the meeting of the Institute of 
Mathematical Statistics in Poughkeepsie, N, Y., September, 1942. 



SETTING OF TOLERANCE LIMITS 391 

holds converges to 1 with ft - Since the limit distribution of 
Zj Vn(x t „ a t )g , is normal with zero mean and variance equal to 22< r^g,g, , 

our Lemma follows easily from the fact that the quantity e in (4) can be chosen 
arbitrarily small 


3. The univariate case. In this section we assume that p = 1 Hence the 
probability density function f(x x , ■ • • , x p , 9,, ■ ■ ■ , e k ) is replaced by the uni¬ 
variate density function /(*, 8 ,, . • • , fl 4 ) In order to simplify the notations, 
the letter 9 without any subscript will be used to denote the set of parameter 
values 0i, • • ■ , 9 k . 

For any positive £ < 1 let <p(0, £) and 4'(0, £) be two functions of 0 such that 

rW, •) 

(5) / f(x, 0) dx = £. 


If f(x, 0 ) is a continuous function of x, functions <p(0, £) and ^(0, £) satisfying (5) 
exist. It is clear that for any function <p(0, £) subject to the condition 


£ V5(9, «) 

/(*, e)dx < i - £ 

no 


there exists a function 4/(0, £) such that (5) holds. We will choose <p(8, £) and 
^(0, £) so that (5) is satisfied and 

(6) Ho, £) - v (fl, £) < tf(0, £) - ?(0, £) 

for any value of 0 and for any functions HO, £) and £(0, £) which satisfy (5). 

Lot (t = 1, • • • , k) be the maximum likelihood estimate of 0, calculated 
from the observations x n , • ■ • , x pn , We propose the use of the tolerance 
limits 

(7) L = *(*, £) and t/ = *(*, £) 

where the value of the constant £ has to be properly determined. Problem 1 
is solved if we can determine £ as a function of (9 and 7 such that 



Problem 2 is solved if we determine £ as a function of |3, X and N such that 


(9) P(M > \N | 0) = (8 

where M denotes the number of observation in the second sample which lie 
between the tolerance limits <?(&, 0 and 4/0, £). The use of tolerance limits 
of the form (7) seems to be well justified by the fact that the functions ip(B, £) 
and 4/(8, £) satisfy (5) and (6) and that 0, is an optimum estimate of 8, (i — 
1 

Now we will derive the large sample distribution of 



302 


ABRAHAM WALD 


/•*<#, 0 

1(3, 8, £) = / , /(ar, 5) dx. 

J «=f*. o 


( 10 ) 

We obviously have 

(ii) m 9 , & -1. 

We will assume that the limit joint distribution of s/n(di — ^i), • • • , 
\/n{h — 9k) is normal with mean values 0 and non-singular covariance matrix 

II 07 /( 6 ) II — || Cij(d) ||~ l where c,,( 0 ) denotes the expected value of ~ -.—Jilt-ill 

ddi 66$ 

(i, j = 1, ■ • • , k). This is known to be true if f(x, 9) satisfies some regularity 
conditions . 8 Furthermore we assume that <p($, £) and ^( 0 , 0 admit continuous 
first partial derivatives with respect to 0 i, ■ > • , 0* and that/(x, 6) is a continuous 
function of x in the neighborhood of x = <p(6, f) and x — ^(0, £). We have 


( 12 ) 


61(3, 0J) 


|<-« 60|- 
Assuming that at least one of the derivatives 


#(fl ’ e) /^(0, f), 0] - /M*, e), 0) 


is not zero, itfol- 


06, 

6/(3, 0J) 

63,- |f~« 

lows from our Lemma that 

Vn[l(3, 0, £) — 1(0, 0, ?)) =“ V / n[J r (3,0, f) — (| is in the limit normally distrib¬ 
uted with zero mean and variance 

a», e - m», i), m'SE ^ ^'-9 ,m 

i i av< 06j 

as) - vm o, «)/««, i), «i £ £ a xM> „„(«) 

+ (/W, {), 911’ £ E *=M 

/ i OVi OUj 

For any positive 0 < 1 denote by Xp the value for which 


l. r 

foZk. 


\/ 2r ^0 


(14) 

Then the probability that 
(16) 

oo towards 0, 


0““’ dt =* 0. 


1(3, 0, f) > * + A, 

Vn 


converges with n 
Let 

(16) 


1(0, y, 3) = y - Xp 

Vn 


* See for instance J, L. Doob, "Probability and statistics," Trans. Amer. Math. Soc,, 
October, 1934. 



SETTING OP TOLERANCE LIMITS 


393 


If a(d, if) is continuous in 6 and £, it follows easily from (15) that the probability 
that 

(17) I[6, d, 5(jS f y, B)} > y 

holds, converges to /3 with n —> a>. Hence we can summarize our results in the 
following 

Theorem 1: Let <p(d, ij) and f/(8, £) be two functions satisfying (5) and (6). 
Furthermore, let the functions 1(1), 6, £), <r 2 (S, £) and Kfi, y, 8) be defined by (10), 
(13) and (16) respectively. Denote by 0?, ■ • ■ , 6° the true values of the parameters. 

It is assumed that there exist two positive numbers e and S such that the following 
three conditions are fulfilled. 

k 

(a) For any point d for which XL (0» — 0.) 2 < e the limit joint distribution of 

t»l 

\/n(th — 6f), • * , \/n(bi — 9 k ), calculated under the assumption that 6 is the 
true parameter point, is normal with zero means and a finite non-singular covariance 
matrix ||cr,,(0)|| where <r t] (8) is a continuous function of 9 in the domain 

T, ( e . - e 0 ,) 2 < e. 

% 

(b) The partial derivatives- -^’-^ ’ — ( * (i = 1, • • ,k) are continuous func¬ 
tions of 0 and £ in the domain 

XL ( e * ~ e ?) 2 ^ « a,nd | f - 7 | < 5. 

t-1 

dl($ 0° y) 

(c) At least one of the partial derivatives - ^ (i = L “ ' i *0 ls not 

equal to zero. 

Then the probability that 

i[i, 6°, W, 7, «)] > y, 

holds , converges to (3 with n —> <». 

From Theorem 1 we obtain the following 

Large sample solution op Problem 1. For large n we can approximate the 

lower and upper tolerance limits by _ . 

M y, 8)} and 4>\l UP, y, 0)1 respectively, where $(/3, y, B) is given by (16). 
Now we will deal with Problem 2. We distinguish two cases 


r # 

hm — = « ■ 

n -+oo n 


It is easy to see that in this case the solution of Problem 2 is obtained from that 
of Problem 1 by substituting X for 7- Hence for large n the tolerance limits 
can be approximated by S, W, \ *>1 and M M, ^ respectively. 



394 


ABRAHAM WALD 


For these tolerance limits condition 2 is fulfilled in the limit, i.c. 
lim P(M > XA T | = 0 

n ”>[» 

j\ r 

(b) The integers n and N approach infinity while — remains bounded. 

Denote ^n[T0, d, £) — £] by u and V*X' ~~ by r, where, df(£) denotes 

the number of observations in the second sample which fall between the limits 
</>((/, f) and ^(5, £). For any fixed value of u the conditional expected value of 

is given by £ W ._ and the conditional variance of is given by 

N y» JV 

- v 

Hence the conditional expected value of v is 


n (* + Vu) 0 ? Vn)' 


equal to u 




and the conditional variance of t* is equal to I £ -f- 


( t + ^s)( 


£ 


-. Sine 

Vw 


Since the limit distribution of u is normal with zero mean and 


standard deviation a(8, £) given in (13), we find that the limit bivariate distribu¬ 
tion of u and v Is given by 

, 3 * 


(18) 


1 


2T<r(fl,|)Vf(f- £) 


exp 


_ u’__ 


L 2<d(0,£) 


2fd - e) J 


du dv. 


From (18) it follows that the limit distribution of e is normal with zero mean 
and variance 


(19) 


’ ( ^ f) 0(0, £) + «f(l - £)) * (1 f) 


= ft£(i - £) + Mr a (e, £) 

71 


From (19) it foLlows easily that the probability that 

( 20 ) 


M(() f. , X|SO-« 

N ~^VN 


converges to /3 with n —> °o. Let 

( 21 ) 


£*0, M) = x - y^iLr 


X) + Mr S (0, X) 
n 


From (20) it follows that the probability that 



SETTING OF TOLERANCE LIMITS 


395 


converges *o 8 with n —> oo. The letter M denotes the number of observations 
in the second sample which lie between the limits <p[6, £*((3, X, 0)] and 

f[o, {*C9, x, *)]. 

We can summarize our results m the following 

Theorem 2. Let ip(B, £) and \f/(6, £) be two functions satisfying (5) and (6). 

Two samples of size n and N respectively are drawn and the maximum likelihood 

estimate 0 is calculated from the first sample only Assume that conditions (a), 

(b) and (c) of Theorem 1 are satisfied. Let 1(8, y, it) and £*(/3, X, 0) be defined 

by (16) and (21) respectively. 

N M 

If n and — both approach infinity, the probability that — > X holds, converges 

to (3, where M denotes the number of observations m the second sample which he 
between the limits ip[B, 1(8, X, 6)] and yp[S, 1(8, X, 0)]. 

N 

If n and N approach infinity while — remains bounded, the probability that 


— > X holds, converges to {3, where M denotes the number of observations in the 

second sample which lie between the limits <p{8, f*(/3, X, 0)] and <p[9, £*(/?, X, 0)]. 
From Theorem 2 we obtain the following 

Large sample solution of Problem 2. If n and — both approach infinity 

n 

the lower and upper tolerance limits can be approximated by <p[b, 1(8, X, 0)] and 

N 

1(8, X, $)] respectively. If n and N both approach infinity while — remains 

bounded, the tolerance limits can be approximated by (p[9, £*( 8, X, 0)] and 
\p[d, f*(/3, X, 0)] respectively. The expressions f(/3, X, 0) and £*(/?, X, 0) are given 
by (16) and (21) respectively. 


4. The multivariate case. For any positive £ < 1 let ^,(0, £) and f,(B, £) 
(i = 1, • • • , p) be p pairs of functions of 0 such that 

(22) / • ■ • | jf (*i , • • ■, x r , 6) dxi ' • • dx p = £. 

•IvfO.P 

If f(x\, ■ • • , x r , 0) is a continuous function of xi , • • • , x p , functions <p,(6, £) 
and i/\(0, £) (i = 1, • - , p) satisfying (22) certainly exist. As in the univariate 
case, there will be infinitely many sets of p pairs of functions <pi(6, £) and f,(6, £) 
which satisfy (22). Since we wish to have tolerance limits as narrow as possible, 
we will try to choose the functions <p,(0, |) and i/\(0, £) so that \p,(6, £) — <Pi(8, f) 
should be as small as possible. Since it is impossible to minimize all p differences 
^(0, £) - n (B, £), • • • , f v (e, f) - <p P (e, £) simultaneously, we will have to be 
satisfied with some compromise solution. For example, we could minimize 
the product II [*<(6, 0 - £)] or some other function of the p differences 

£) - tpfe, £). Another reasonable procedure would be to minimize 



39G 


AHKAH AM WAI.D 


II [^.(0, £) — v.(0, 01 sul>ji*rt to (22) an«l tlu* condition that for any i and j 
» 

P P i-s equal to the ratio of the standard deviation of x, to that of x, . 

YAO, i) - y,(0, £) ' 

Here we will deal with the problem of cleaving tolerance limits for the vax-iates 
Xt, • ■ ■ , .r„ after the functions <^.(0, £) and vL(0. f) have been chosen. Since 
the theory of the multivariate ease is very similar to that of the univariate 
case, we will merely outline it briefly. 

As tolerance limits for x, we will use the functions £) and £) where 
the value of £ has to lie properly determined. Problem 1 is solved if we can 
determine. £ as n function of fi and y so that 


[ r¥ p (M) 



V *>,(«.£) 

■J . /(-Tl, • 

■ • , Xp, 0) rfxi • • • dr„ > 7 I 0 > 


Problem 2 is solved if we determine £ as a function of 0, X and AT such that, con¬ 
dition 2 is fulfilled. Let 


(24) 

and let 


r*\ i'.o 

«M,() ~ I , ••• / . /(a, 


(25) 


rWM) 

'vptd.fl 

r v*.o 


, .r ; ,, 0) d.ci 




W, 3, £, T.) - / • • • J . I . 


/■¥ itLf) 

/ , f(*\, ,Xp,Q)dx i • • • dx,_i d*,,.i • • ■ dx p . 


We have 
(2G) ^ 


9 


= 0, £, W0, £)] 

8 «»i do; 


Assuming that the partial derivatives 


cW0, f) 

i 

si (A 9, f) 


- £ '< /*. 9, i, ».«. ill. 


tinuous functions and that 


am 0, j) 

30, 


d6, 




(i — 1, • • > , fc) are con- 


a-e 


is not zero for at least one value of 


i, it follows from our Lemma that \/ft [/($, 0, £) — 7(0, 0, £)] = Vn [/(0, 0, £) — £] 
is in the limit normally distributed with moan value, zero and variance 

»■<», o - tit i&w,o,t,M«,m°,s,t, mo, f)w« 

A-i «*“L y-i i~i w< aUj 


(27) 


- 2 2 £ Z 2 aD 

.8.1 301 30, 


lsizi: 


3^.(0. £) 3y,(0, £) 

30i ' 30, 

‘7.[0, 3, f, ^.(3, £)tf»[0, 3, f, ¥>,(3, £)W») 

3y.(3, £) 3y,(0, £) 

001 30/ 

•I,[0, 0, £, ¥>.(3, $)]J,[0, 3, £, ^,(0, £)]<ri/(0) 



SETTING OF TOLERANCE LIMITS 


397 


where || cr, 3 (0) || is the limit covariance matrix of Vn(0i - 0i), ••• , 
■\/ n(0i — die). 

For any positive j3 > 1, let be the real value defined by the equation 
(28) 

Let 

(29) 1(0, y,b) = y - 

V n 

and 

(30) f*(P, A J) = A - J foi 1 X > . 

VN V n 


We can easily prove the following two theorems: 

Theorem 3. Let < p ,(8, () and \ pfid , {) (i = 1, ■ • • , p) be p pairs of functions 
which satisfy (22). Let the functions I(b, 6, £), 0 2 (0, £) and f(/3, -y, 0) be defined 
by (24), (27) and (29) respectively. Denote by 0° , ■ • • , 0° i/ie true, values of the 
parameters 0i ,•••., 0*. It is assumed that there exist two positive numbers e and 
5 such that the following three conditions are fulfilled : 

k 

(a) For any point 0 for which 52 (0> — 0?) 2 < t the limit joint distribution of 

v—1 

s/n0i — Si), • • • , \Zn(b k — 0*), calculated under the assumption that 0 is the 
true parameter point , is normal with zero means and a finite non-singular covariance 
matrix || a;fid) || where a, fid) is a continuous function of 8 in the domain 
£ (0< - 0°) 2 < €. 


6 £) 

(b) The partial derivatives --— . 

du t \* ma9 

k 

lions of 0 and £ in the domain 52 (0. — d 0 ,) 2 < t and | £ —K | < 5. 

dl(& 6° t ) 

(c) At least one of the partial derivatives -— . (i = 1, 

' ' 66 , 


(i — 1, • • • , k) are continuous June- 


, k) is 


not equal to zero. 

Then the probability that 


I(b, 0°, f(fi, 7, 0)] > 7 


holds , converges to 0 with n —► <». 

Theorem 4. Let v fid, £) and ^i(0, f) (i = 1, — , p) be p pairs of functions, 
which satisfy (22). Two samples of size n and N respectively are drawn and the 
maximum likelihood estimate b is calculated from the first sample only. Assume 
that conditions (a), (b) and (c) of Theorem 3 are fulfilled and let f(/3, y, b) and 
t*(0, X, b) be defined by (29) and (30) respectively. Denote by y ta the outcome of 
the a-th observation on the i-ih variate m the second sample. 



AUllAHlV WALD 


398 


v 

If n and ' hath approach infinity , tin probability that M > XX holds converges 
to /I, when M dt notes the numhtr of differ/nl mines of « for which 

v-.|o, fOt», x, 0)] < y,n < 1^,10, f(rf. x J/j (I » 1, ■ ‘ , P). 

x 

If n and X approach infinity while n mains hounded, th< probability that 

M > X.V holds con verges to jl uio rt .1/ donahs tin numbtr of diffi rent mines of a 
far which 

*>.10, i ■*(/!, x, 0)] < y, a < Ui), t*Uf X, i»] 0 = l, • • ■, p). 


The proofs of Theorems 3 anti 4 are omitted since they are similar to the 
proofs of Theorems 1 and 2 

From Theorem 3 we obtain the following 

Laiioe hamit, i: sor.rnoN ok Puoiilem 1. For largt it wr ran approximate the 
lower and upper tolerance limits for x, by <^,[0, I Of y , 0)| and f,[6, f (d, y, 0}] 
respectively where f(d, 7 , 6) is given by (29). 

From Theorem 4 we obtain the following 


V 

La not; sample kolptiox ok Problem 2. If n ami ‘ approach infinity , the 

lower and upper tolerance limits for x, can he approximated by v?,(0, f((J, X, 0)) and 

y 

f ( [0, f(d. X, 0)1 respectively. If n and X hath approach infinity while ■ remains 

bounded, llw loleranee limits for x, can be approximated by y,|0, X, 0)) and 
^,[0, f*(d, X, 0)1 respectively. The expressions Iff, X, 0) and X, 0) are defined 
in (29) anti (30) respectively. 


6. An example. Let .r be a normally distributed variate with mean value 8i 
and. standard deviation Q-i, i.e. the probability density function of x is given by 


fix, 0 x ,0f) = e 


in 


"s/27r0a 

For any positive 5 < 1 let p(£) be the value for whieh 


V2 


L r*» 
/ 2yr i-p(f) 


)«’ 


dt - f. 


Then the functions 
and 


*>(0, Z) - 0, — p(Z)0 s 


HO, f) - 0, -I- p(*)J, 


satisfy conditions (5) and (ti), 
We have 


0i = *’-+. 


+ In 


n 


~ x and 0 2 = 


S (Xa ~ 2)1 

n 



SETTING OF TOLERANCE LIMITS 


The variance of \/V(0 1 - 0i) is equal to q\ and the limit variance of Vb(4 - 62 ) 
is equal to Jfc. Since the covariance of and 4 is equal to zero, we obtain 
from (13) 



1 


Hence for large n the tolerance limits satisfying (1) can be approximated by 
$1 - p(£)0 2 and 0i + p©0 2 respectively where 

? _ „ _ \ -ipI(7)P 
and X (3 is the value determined by the equation 


7f f «** <*-(!. 

v2ir Jx» 


If n and iV are large, the tolerance limits satisfying (2) can be approximated by 
- p({*)ft 2 and 0i + p(f)fc respectively where 



STATISTICAL PREDICTION WITH SPECIAL REFERENCE TO THE 
PROBLEM OF TOLERANCE LIMITS 1 

By S. S. Wilks 
Princeton University 

1. Introduction. Statistical methodology is becoming recognized in industry 
as an effective tool for dealing with certain prohlemK of inspection and quality 
control in mass production. Quality control experts have found statistical 
methods useful in detecting excessive variation in a given quality characteristic 
of a product from a series of observations on the given quality characteristic, 
and in isolating the causes of such variations back in the materials or operations 
involved in manufacturing the product. By a process of successive detection 
and elimination of causes of variability, a controlled slate of quality is established. 
A practical statistical procedure for establishing a controlled state of quality 
has been developed by Shewhart. 3 More recently, manuals for routine applica¬ 
tion of this procedure have been issued by the American Standards Asso¬ 
ciation. 1 

In this paper we do not propose to go into a discussion of the application of 
the well known Shewhart procedure. The reader may refer to the literature 
mentioned in footnotes 2 and 3 for such discussion. It is sufficient to remark 
that experience shows that the application of this procedure leads to a con¬ 
trolled state of quality. Such a state of control provides a basis for making 
statistical predictions about measurements on the given quality characteristic 
in future production. 

More specifically, suppose a given quality characteristic of a given product is 
measured by a variable X, such that X has a specific value for each individual 
product-piece. For example, the product may be a given type of fuse and X 
may be the blowing time in seconds. A product-piece would be a single fuse, 
and X would take on a value for each fuse. Thus, for a sequence of n fuses 
taken from the production line, there would be a corresponding sequence of 
values of X, say Xu, X*, • ■ • X* . If a state of control has been established 
with respect to blowing time as measured by X, then the Sequence of values 
of X will "behave like a random sequence.” By this we mean that the sequence 
will be such that we can safely assume that it can be described mathematically 
by regarding X as a continuous random variable, i.e., such that there exists some 


1 An expository paper presented at a joint session of the American Mathematical Society 
and the Institute of Mathematical Statistics at Poughkeepsie, September 0 1042. 

1 W. A. Shewhart, Control of Quality of Manufactured Product, D. Van Nostrand Com¬ 
pany, Now York, 1031. 

‘Guide for Quality Control and Control Chart Method of Analysing Data (1941), and 
Control Chart Method of Controlling Quality During Production (194$), American Standards 
Association, New York. 


400 



TOLERANCE LIMITS 


401 


probability function/(a:) which describes the distribution of values of X, such 

that f(x) dx is the probability that a < X < b for any two real numbers 

a and b. Now, suppose we consider a sequence or sample Si of n values of X, 
and let Xj and X n be the smallest and largest values of X in the sequence. 
The types of questions with which we are concerned are the following: If a 
further sample, say S 2 of N values of X is taken, what is the probability P that 
at least No of the values will lie between Xi and X 2 as determined by S{t If 
we choose a given probability a, at least what proportion of values of X in an 
indefinitely large sample Si will fall between Xi and X 2 of S\ with probability a? 
What is the probability P' that at least N 0 of the values of S 2 will exceed Xi 
of Si? At least what proportion of values of X in an indefinitely large sample 
S 2 will exceed Xi with probability a ? These questions suggest several of a 
more general nature which can be treated by methods similar to those which 
will be discussed. For example, instead of taking Xi and X„ , i.e. the smallest 
and largest items in Si as tolerance limits we could use X m and X n - m +i • More 
generally, we may define 100ffi a % tolerance limits h(xi, x 2 , ■ ■ • x n ) and 
Li(xi , Xi, • • • , x n ) for probability level a of a sample Si of size n from a popula¬ 
tion with distribution f(x) dx as two functions of the X’s in *S'i such that the 
probability is a that at least 100R a % of the X’s of a further indefinitely large 
sample S 2 (i.e. the population) will lie between Li and Li . Or more briefly 

P ( f(x) dx > R^j = a. 

The same notion clearly applies if Si is a finite sample of size IV, rather than an 
indefinitely large one. In this case we would be interested in the largest integer 

N a such that the probability is at least a that at least l00R a % («■ = $) 

of the X’s in S 2 would lie between Li and L 2 . In most practical situations we 
are able to assume nothing more about f(x) than it is a probability density 
function. We make only this assumption here. The only functions of the 
values of X in Si that we shall consider here in setting tolerance limits are order 
statistics, i.e. the ordered values of X, because the results will then be fairly 
simple and independent of f(x). 

2. A General Probability Formula. It will be convenient perhaps to derive 
a general probability formula at this stage from which we can derive certain 
special cases as we need them. 

Let Xi, Xi, • ■ ■ , X n be the n values of X in Si arranged in order of in¬ 
creasing magnitude. Let ri, r 2 , • • ■ , r k be integers such that 1 < r x < r 2 < 


< r k < n. Let x r 


x rt be fc real numbers. Let 



402 


8, H. WII.KH 


from which 


/(*,)) f/Xr, « dpt , /(j*r.) rfft* - rf/?l , ■ ' • d.r r , ~ rfpfc . 

Thou assuming AA, AA, • * • , AA to he a random sample 1 (ordeml) from a 
population with probability clamant f(x) d.c it follows from the multinomial 
distribution law 4 that the probability of x r( < AA, < x T% -|- dr r> ft = I, 2, ■ ■ • , fc) 
is given by 


n! 


(1) 


1! r, — n ~ 1 i 


i\ — I'k-x — 1! n — r*! 


vl' 'pi' r ' 1 


pk k rk ~' ‘ p£u r * dpy(lpi ■ • ■ dpk 


except for terms of order higher than, {dpidp? • - • <fp*L Given that X r , ~ 

x r . , • • ■ , AA k =* x n in , the conditional probability that AA , AA , * • • , 
A+* \ 

AA +1 ( iV» = A r j of the values of X in »S' a will fall in the intervals (— x, x rj ), 
(x r , , x ri ), • • ■ , (x rlt , x ) respectively is by the multinomial law 


( 2 ) 


_N\ 

Ny\N 2 [:.:N M \ 


Pi' Pi 5 1 • • P*M 1 . 


The joint probability law of AA, , X, t , • * • AA* and AA , AA , > ■ * , Wjh x 

A-H \ 

(£ AA « A T j is given by the product of (1) and (2). Integrating this product 
with respect to the x's (i.o. the p’s) we find the probability law of the AGs to bo 


iVlnlATi + rx - l l^+^-rx- 11 • • • AA + r k ~ r k „i — I! AA,i -f- n - r*l 
W ■ ' n A. iin-rx- ir»-r*rAr + ftlM!Af 3 !... 

which is clearly independent of /(x). This result can be derived by direct com¬ 
binatorial methods but the. present derivation provides a simple proof that the 
result is independent of f(x). 


3. The Problem of One Tolerance Limit. There, are problems in quality 
control in which it is important to consider only one tolerance limit. For 
example, in testing breaking strength of steel wire the most significant tolerance 
limit is the lower one. The problem of prediction in this case is as follows: 


1 Which states that if a trial results in ono and only one of the mutually occlusive events 
E >, E t, , Ek , tho probability P that in a total of n trials m will result in Ei , n> in 


, n* in Ek »» nj, is given by 


nil ml ■ nit 1 


where Pi, jh, , p*, (^,V< m are the probabilities of a single trial resulting in E% , 


Ei, , Ek respectively. 



tolerance limits 


403 


Suppose the given quality characteristic, as measured by X, is in a state of 
statistical control, and that a sequence of n measurements on X have been 
made. Let Xi be the smallest of the n values. What is the probability that at 
least N a of N further measurements on X will exceed the value Xi as deter¬ 
mined by the initial sample? Instead of considering the smallest value of X 
as the lower tolerance limit we could just as easily choose the second smallest, 
or any other small order statis tic but the case of the smallest value is perhaps 
of greater practical interest than any other case. The problem of an upper 
tolerance limit is entirely similar to that of a lower tolerance limit. 

Table I 

Values of N a and, R a for a = 0.99 and 0.95 for several combinations of values of X 
and n, and for the problem, of one tolerance limit. (For N — <x, R a is denoted 

by R a ) 


n 

N 

Of = 

0.99 

a =* 

0 95 

Nn 

R 99 

N n 

R u 

10 

10 

5 

mmm 

7 

.700 

10 

20 

11 


14 

.700 

10 

OO 

— 

.631 

— 

.741 

■EH 

50 

44 

* ■ 

46 

.920 


100 

90 


93 

.930 


OO 

— 

.912 

— 

.942 

100 

100 

94 

.940 

96 

.960 

100 

200 

189 

.945 

193 

.965 

100 

OO 

1 

.955 

— 

.970 

500 

500 

494 

.988 

496 

.992 

500 

1000 

989 

.989 

993 

.993 

500 

OO 


.991 

— 

.994 


The probability Pi(N q ) that No of the N further measurements will exceed the 
smallest value of X m an initially drawn sample of size n is given by (3) for 
k — 1, ri = 1, No = No, Ni = N — No , i.e. 


(4) 


Pi(No) = n 


N\No + n - 1! 
Nq\ N + nt 


Values of Pi (Vo) can be easily calculated by using the recursion formula 


(5) 


Pi(N 0 - 1 ) = 


No 

No + n — 1 


Pi(No). 
















404 


S. 8, WILKH 


For given values of A r , n ami a we. are interested in the largest integer A r „ for 
which 

( 0 ) It Pi(No) > a, 

ft 0*".Y a 


If we set- = R« and set Liin R a ~ R a it can be verified that the value of 
N f/~«D 

R a is given by solving the following equation for R a 
(7) n f f"" 1 d£ = a. 

It will be observed that nf" ~ l d£ is to within terms of orfler df the, probability 

that £ < / f{x) dr < £ + d£ in samples of size, n from a distribution with 

probability element f(x) dx, where Xi is the smallest value of X in the sample. 
The statistical interpretation of (7) is simply this: The probability is a that the 
proportion of values of X exceeding X\ m a further indefinitely large sample is 
at least Il a . 

Choosing a = 0.99 and 0.95 Table I shows values of N a and R a for various 
combinations of values of n and N for the case of one tolerance, limit. The 
table indicates the degree of precision with which predictions about a single 
tolerance limit can be made from a sample of size n about a further sample of 
size N for a few important values of n and N. It should be noted that each 
prediction is made concerning a pair of samples, i.e. an initial sample of size n 
and a further sample of size N and that the prediction holds for any function/(i), 
Thus os a typical entry we may state that if a sample of 100 is drawn and also 
a sample of 200, then the probability is 0.99 (approx.) that the X’b of at least 
189 (or 94.5%) of the cases in the second sample will exceed the smallest X in 
the first sample. 

4. The Problem of Two Tolerance Limits. Again, suppose the given quality 
characteristic as measured by X is in a state of statistical control and that a 
sequence of n measurements are made on X. Let Xi and X n be the smallest 
and largest values of X respectively. The question to be considered now is the 
following: What is the probability that at least No of N further measurements 
on X will lie between the values Xi and X „, as determined by the. initial sample? 

We proceed by considering the special case of (3) for which k « 2, n — 1 
r% s= n,No=*No>Ni : =N~Ni)~Ni. We find for the joint distribution 
of Ni and No 


N\ rv! No + ft — 2! 
n — 21 Xol If+ n!' 


( 8 ) 


P(N lt No) 



TOLERANCE LIMITS 


405 


To obtain the distribution of N $, we simply sum (8) with respect to Ni from 
0 to N — No , thus obtaining 

(9) PM = »(» - 1 )(N - N 0 + 1) ■ 

A convenient recursion formula for computation purposes is 


( 10 ) 


P t {No - 1) = 


MN - Wo + 2) 


PM- 


(N -No + 1)(W 0 + ft - 2) 

For given values of W, n and a we require the largest value of N a for which 


( 11 ) 


E PM><*. 

-K. 


Setting ~ = ie a and Lim one finds that is given by solving 

2y .y-*» 

the equation 6 for R a 

(12) n(n — 1) [ r*(l — £) “ “• 

It can be verified that ft (ft - 1)£"“ 2 (1 - £) d£ is to within terms of order d£ 

the probability that £ < f" f(x) dx < £ + d£, thus showing that (12) is the 

probability that the proportion of an indefinitely large number of further values 
of X lying between X x and X n is at least R a ■ 

Table II gives, for the case of two tolerance limits, values of N a and R a for 
several important combinations of n and N, including limiting values R a of R a 

for indefinitely large N, _ 

It should be noted that the problem of two tolerance limits can be immediately 
extended to the case where the lower and upper tolerance limits may be any two 
of the order statistics in Si. 


6. The Problem of Tolerance Limits for Two Quality Characteristics. We 

have thus far devoted our discussion to the problem of tolerance limits for a 
single quality characteristic. The problem of two or more quality character¬ 
istics can be treated by methods similar to those already used. The simplest 
case is that m which each product-piece under consideration is measured on two 
independent quality characteristics. Suppose the two characteristics are meas¬ 
ured by X and Y. Let a sample of n product-pieces be taken, assuming a state 
of statistical control has been established, and let X x be the smallest of the 1 
values and Fi the smallest of the Y values. The question with which we are 


«This limiting case in the problem of tolerance limits as well as that expressed in (1) 
and other similar limiting cases have been considered by the author m an jJer P p 
‘■Determination of Sample Sizes for Setting Tolerance Limits,” Annuls of Math. Slat. 
Vol. XII (1941) pp. 91-96. 



406 


S, S. WILKS 


concerned here is the following: If N further product-pieces are measured onJf 
and F, what is the probability that X > Xj and F > Fi for iVoof the pieces? 

Let X and Y be statistically independent and let/(x) and g(y) be the probability 

(••*1 r r 1 

functions of X and Y respectively. Let I f{x) tlx = p and / g{y) dy = q. 

J— 00 J—cc 

The probability law of p and q is 

( 13 ) »*(1 - py-'a - q) n - l dpdq. 


Table II 


Vallies of N a and R a for a — .99 and .95 for several combinations of values of N 
and n and for the problem of two tolerance limits. (For N = <x>, R a is denoted 

by R a ) 


n 

N 

a 

0.09 

a «a 

0.95 

mu 

R.» 

N fl 

R H 

10 

10 


.400 

5 

.500 

10 

20 

8 

400 

11 


10 

OO 

— 

,490 

— 


mmm 

50 

42 

.840 

44 

.880 

r ft#-; 

100 

85 

.850 


.900 

'ills 

OO 

— 

.874 


.909 

100 

100 

89 

.890 

92 

1 

100 

200 

184 

.920 

188 


100 

OO 

— 

.935 

— 

.953 


500 


.982 

494 

.988 


1000 

985 

.985 

989 

.989 

m 

OO 

— 

.987 

— 

.991 


In a further sample of size N the probability that for No of the cases, X > Xi 
and F > Fi , Xi and Fi being determined by the first sample, is 

(14) [(1 ~ p)il ~ ~ (1 “ 


The joint probability law of No , p and q is given by the product of (13) and (14). 
Integrating this product with respect to p and q we obtain as the probability 
law of No , 


HfNo) = ri 1 


(N\ (N - N 0 \ (-1 y 

\NoJ i-o \ i ) (n ~b No + i) 2 


(15) 


























TOLERANCE LIMITS 


407 


For given values of A T, n and a it is important, as before, to determine N a as 
the largest integer for which 


(16) £ Wo) > «. 

N 3-A r a 

N c, 

Setting = R a and Lim R„ = R a one finds R a to be given by solving the 

N-+OQ ° 

following equation for R a 


( 17 ) -n 2 f l;”- 1 log £ d£ = a 

R a 

The expression^ —n~£ n 1 log £ d£ is simply the probability that £ < 

\j x ■K*) 0(v) dyj < £ + d£ to within terms of order d£, which is the 

proportion of the population pairs ( X, Y) for which X > X\ and Y > y x . 

In the problem of two tolerance limits for each quality characteristic, as deter¬ 
mined by an initial sample of size n, we calculate the probability that jV 0 mem¬ 
bers of a further sample of size N will fall within the two sets of tolerance limits, 
with respect to the two characteristics. The problem is similar to that for 
one tolerance limit for each of two quality characteristics. For this case, we 
find corresponding to (15), (16), (17), respectively, the following- 


(18) PM = n\n -I)*)*) £° ( N ~ Na ) Tw 

\N o/ (to \ r / {No 

and 


(- 1 )’ 


+ n-l + t) 2 (JV 0 + n + l) 2 ’ 


(19) £ Pm > a 


(20) n\n - l) 2 f £ n_2 [2(£ - l) - (£ + 1) log £] d£ = a. 

The derivations of results analogous to (15), (16), (17), (18), (19), (20) for 
tolerance limits defined by other order statistics than least and greatest and 
also for more than two independent 0 quality characteristics are straightforward. 


6. Further Remarks and Discussion. For a given set of tolerance limits on a 
random variable X as determined by an initial sample of size n, we have dis¬ 
cussed the problem of predicting, with a given degree of probability, at least 
what proportion of values of x in a further sample (finite or .indefinitely large) 
will lie between these tolerance limits. We have obtained theoretical results 

• In a paper to appear in a forthcoming issue of the Annals of Math Stat., A. Wald has 
shown how to set up tolerance limits for the case of two or more statistically dependent 
variables 



408 


8. 8. WILKS 


which depend only on the assumption that X is a continuous random variable 
with some probability element f(x) dx, where f(x) is not assumed known, 

It should be emphasized that the concept of a random variable is very broad 
in the sense that X may be a random variable determined as a result of calcula¬ 
tions on other random variables. For example, X may be the, difference, 
product, or ratio of two random variables, or the average or any other “reason¬ 
able” function of several random variables which may be of interest in any given 
situation. Thus, on the basis of an initial sample of differences of two random 
variables, we may set up tolerance limits of differences and make predictions, 
for a given probability level as to how many differences in a further sample 
of differences will lie between these, tolerance limits. Similarly for products, 
ratios, and other functions of random variables. 

From the point of view of practical application, we should again note that the 
mathematical assumption that X is a random variable means that a state of 
statistical control as described in §1 must exist in the measurements to which 
the tolerance limit prediction theory is to be applied. In practice X is often a 
discrete variable, i.e. one which can take on only certain isolated values. For 
example, if X is the number of defective product-pieces in a drawing of one 
product-piece, X is either 0 or 1, depending on whether the piece was non¬ 
defective or defective. Our theory would not be applicable to such a case. 
However, if we take as a new variable the average value of X for several product- 
pieces, we then obtain a variable that is continuous enough for the tolerance 
limit theory to be applicable for all practical purposes. 

Finally, we remark that although we have used, as concrete examples, situa¬ 
tions in masB production engineering, the notions of tolerance limitH and predic¬ 
tions within tolerance limits which have been discussed apply equally well to 
situations in any branch of applied science where measurements are made and 
used as a basis for predictions concerning future measurements. 

7. Summary. After a state of statistical control has been established with 
respect to a quality characteristic of product-pieces in mass production by the 
standard statistical quality control methods developed and refined by Shewhart 
and others, there remains the problem of determining the accuracy of predic¬ 
tions as to how many future product-pieces will fall within tolerance limits 
specified by measurements on product-pieces already produced under the given 
state of control. This problem and some of its extensions arc discussed in the 
present paper. 

More specifically, suppose an initial sample of n product-pieces, manufactured 
under a given state of statistical control, are measured with respect to a given 
quality characteristic. Let X be a variable which measures the given charac¬ 
teristic, so that X has a definite value for each product-piece. Let Xi be the 
smallest and X„ the largest value of X which occurs in the initial sample. Now 
consider a further sample of size N. The following problems of prediction re¬ 
lating to the second sample from information yielded by the initial sample are 



TOLERANCE LIMITS 


considered' (1) What is the probably that at least Wo values of I in the second 
sample will exceed the few htt set by the first sample! (2) What is the 
probability that at least Jfi values of I in the second sample will lie between the 
two fence Ms h and I, set by the first sample? (3) For given values 
of n and N and a (c,g, .39 or ,95), what is the largest integer W„ such that the 
probability is at least a that Wo > M (4) What is the limiting value of 

= R, as JV increases indefinitely! Tables of values of Jf, and 2, are given 
Jv 

for each of the two problems (1) and (2), for several important combinations of 
values of n and N and for a = 99 and ,95, 

Problems similar to (1), (2) and (3) are discussed for the case in which toler¬ 
ance limits are placed on two or more quality characteristics simultaneously, 
The generality of the theory of tolerance limits and how it applies to differ¬ 
ences, products and ratios and other functions of two or more random variables 
are briefly discussed, 



GENERALIZED POISSON DISTRIBUTION 

By F. E. Satterthwaitb 
Aetna Life Insurance Company 

1. Introduction. The Poisson distribution is one of the most fundamental 
of statistical distributions, It is the distribution law for the number of events 
if the probability of an event happening in any infinitesimal unit of time is inde¬ 
pendent of the probability of its happening in any other unit of time. Fre¬ 
quently when we analyze statistics which obey the Poisson law it is desirable to 
give varying weights to the different events instead of considering them all of 
equal value. Such is the case in analyzing insurance statistics where the events 
are the claims received by the office and the weights are the cost of the claim 
to the company. We shall now show how the Poisson distribution can be 
generalized so as to be adequate for such an analysis. 

2. First development. I,et,/(x, a) be the distribution function of the weights 
assigned to the events where the variable, x, refers to the weight and the vari¬ 
able, a, refers to time. The characteristic function of f(x, a ) is 


<f>(t, a) = J e ilz f(x, a) dx. 


Also let p(a) da he the probability that an event will occur in the infinitesimal 
unit of time, a to a + da. If y represents the sum of the weights, the distri¬ 
bution function of y for this unit of time is 

F ia (y, a) = 1 - p(a) da, y - Q 

' } = f(V, a)p(a) da, y > 0. 

The characteristic function of this distribution is 


$da(b a) = ^‘"(l - p(a) da) + p(a) da J e' lv f{y, a) dy 


( 2 ) 


— 1 ~ p(«) da( 1 - a)) 

_ g-p(a)da( 1—♦(*,£>>) 


In forming equations (1) and (2) we ignore infinitesimals of orders higher than 
the first in the da. 

The expected number of events in the period of time from ai to a% is 

P = I p(a) da, 

and the mean distribution of weights during the same period of time is 


f(x) = J [p(a)/P]f(x, a) da. 


410 



POISSON DIBTEIBUTION 


411 


The characteristic function of this mean distribution of weights is 

l 

<t>(t) — J e' lx f(x) dx 


= J [p(a)/P]<j>(t, a) da. 

These equations are based on the assumption that the probability of an event 
occurring in any unit of time is independent of the probability of its occurrence 
in any other unit of time and also the assumption that the weights assigned to 
each event are independent. These assumptions are implied in all that follows. 

Since the characteristic function of the sum of independent variables is equal 
to the product of the respective characteristic functions, the characteristic func¬ 
tion of the sum of the weights during the period of time, an to , is 

$(<) = n<f> Ja (t, a) 

( 3 ) __ g—fp(a)dc,+lj>(a)ij>{t,ci)dn 


_ e -pa-tu)) 

Applying the Fourier transformation, the distribution function of the sum of 
the weights is 



iiv-ru-MD) 


dt. 


Equation (3) gives a convenient method for defining a generalized Poisson 
distribution. Any distribution which has a characteristic function in the form 
of $(i) where 4>(t) is the characteristic function of an arbitrary distribution will 
have all the properties of a generalized Poisson distribution 


3. Second development. If we let represent the characteristic function 
of an arbitrary distribution, the characteristic function of the sum of n inde¬ 
pendent items obeying such a distribution law is $„(<) = [<#>(<)] n . If instead of 
considering n to be a fixed quantity we assume that it is an independent sta¬ 
tistical variable obeying the Poisson distribution law with mean P, the charac¬ 
teristic function of the sum, y, of the items of the sample becomes 

m = s n p n [m n e- p 

n\ 

_ ( 0 ) 

’—' c • 

Therefore y is seen to obey the generalized Poisson distribution law. 


4. Properties. The generalized Poisson distribution preserves the unique and 
very important property of the Poisson distribution that nowhere in its develop¬ 
ment is it necessary to make any assumptions regarding homogeneity. The 



412 


F. E. SATTERTHWAITE 


only requirement is that the occurrence of and weight assigned to any event 
shall be independent of the occurrence of or weight assigned to any other event. 

The distribution of the sum of the weights is a function of the expected number 
of events, P, and of the mean distribution of weights, f(x), alone. It is inde¬ 
pendent of the way in which P and /(x) are made up. Thus, if we are studying 
the distribution of the sum of the weights over a period of a year and if P and 
/(x) vary with the seasons, the distribution of y is no different than it would be 
if P and/(x) were constant. It is only necessary that the/(x)’s for the different 
seasons be weighted in proportion to the expected number of events in deter¬ 
mining the mean/(x). 

Note also that in the first development it is not necessary that the variable, a, 
refer to time. It could just as well refer to different classes of events dis¬ 
tinguished on any other basis. Therefore, heterogeneous material may be com¬ 
bined in an analysis if it is possible to determine the appropriate mean distri¬ 
bution of weights. 

For a given weight distribution the generalizes! Poisson distribution for an 
expected number of events, nP, is identical with the distribution of the sum of n 
independent items each of which obeys a generalized Poisson distribution with 
P expected events. 

Because of the property described in the preceding paragraph it is immediately 
apparent that a generalized Poisson distribution obeys the law of large numbers. 
As the number of expected events increases the distribution approaches the 
normal distribution. 

6. Moments. The moments of a generalized Poisson distribution are func¬ 
tions of the moments of the underlying weight distribution. By differentiating 
the characteristic function we obtain the following formulas in which the pre¬ 
subscript, o, refers to the moments of the weight distribution, /(x): 

Mi = Pm a m 
Mi = Pm = a 
Ms = PoM« 

M4 = Pm + 3 (PoMa) 1 . 

The above formulas may be verified through general reasoning by considering 
the moments of the distribution, F da (y, a) (see equation (1)). This distribu¬ 
tion refers to an infinitesimal unit of time and all the moments about zero arc 
infinitesimals of the first order. In passing from the moments about zero to 
the moments about the mean the corrections are all infinitesimals of at least the 
second order, Therefore, the corrections may be ignored and the moments 
about the mean may be considered to be equal to those about zero. The above 
formulas follow if we take a sample of size P/pda from this population. 

In order to obtain Pearson’s moment functions for a generalized Poisson 
distribution for any given mean value it is convenient to calculate the following 
parameters of the weight distribution: 



POISSON DISTRIBUTION 


413 


(4) 


o m = oyi 
o <x — oHi/aWi 
oft = (oM3/W)Voff 6 

o(ft — 3) = 

The Pearson moment functions then take the convenient forms: 

<r 2 /ra 2 = off 2 / m 

(5) ft = oft /m 

(ft — 3) = 0 (ft — 3 )/m. 

6. Further generalizations. Often the expected number of events is not 
known but can be estimated to a greater or less degree of accuracy. In such a 
case it is convenient to assume that P is a statistical variable distributed about 
some expected value, say P'. A Type III distribution, 

■~h—\ 

e 


q(P) = — (iX p»-K-W 
1 T(b) \P'J 


will generally be as satisfactory as any to assume for P. The parameter, b, 
can be chosen to give any desired standard deviation. The characteristic func¬ 
tion of the distribution of the sum of the weights under these conditions becomes 


<f> 


/(<) = J 6 -«W(0> 0(p) dP 
p’ d ~ 


= ii + 


The second development suggests another generalization. Instead of assum¬ 
ing that the number of events, n, is distributed in accord with the Poisson 
distribution, we may assume any discrete, non-negative distribution, h(n). 
The distribution function for the sum of the weights is then 

F'(y) = 2 h(n)f(y,n) 

where f(y, n) is the distribution function for the sum of n independent weights. 
The variance, a 2 , of this distribution is given by the formula, 


n (7 


1 off 
T 2 1 

n m ow 


„ m i 

where m refers to the mean, n refers to the distribution h(n), and o refers to the 
weight distribution. Some writers have assumed that statistics of this type are 
distributed as a product. Such an assumption is incorrect and causes an over¬ 
statement of the variance to the amount of „m • om 2 • „a 2 • off 2 . 


7. Application. In Table I is shown the distribution of claims under a cer¬ 
tain plan of group sickness and accident insurance. The parameters, (4), for 
this distribution are 



414 


F. B. 8A.TTEUTHWAITE 


(G) cm. - 3.02, = 8.1, nft = 14, 0 (ft - 3) = 15. 

This distribution is in terms of weeks per claim. The insurance company is 
interested in the financial cost per claim. A study shows that the distribution 
of the rate of weekly indemnity to which different classes of employees are 
entitled has the average parameters, 

(7) m = 15.25, nr* = 1G.5, jj3t « 20, ,(ft ~ 3) = 25. 


Since the moment about zero of the product of independent statistics is equal 
to the product of the moments, it is permissible to multiply together the corre- 

TABLE I 


Nearest Duration of Claim in Weeks 

Number of Claims per Year 
per 10,000 Employees 

0 

197 

1 

418 

2 

173 

3 

109 

4 

84 

5 

58 

G 

45 

7 

36 

8 

27 

9 

24 

10 

20 

11 

17 

12 

14 

13 

128 


sponding parameters of (G) and (7) to obtain the average, parameters for the 
distribution of the financial cost per claim. These are 

m = 55.2, « 134, = 280, ,(ft - 3) = 375. 

In order to study the distribution of cost under a group of policies for each of 
which $180 in claims is expected, we apply equations (5) to obtain the pa¬ 
rameters, 

(8) o/m l » .74, ft » 1.0, ft - 3 - 2.1. 

Since the expected number of claims is 

P - 180/55.2 = 3.3 

the probability that there will not be any claims under a policy is 

M0) = i (3.3)° e -3 ' 3 = .037. 





POISSON DISTRIBUTION 


415 


Adjusting the parameters, (8), to remove the zero claims and choosing the scale 
so as to express the results as loss ratios gives the parameters, 

m = 61.6%, (r = 52.8%, ft = 1.57, ft = 4.90. 

A Pearson Type I curve fitted to these parameters intersects the axis well below 
the zero point. Therefore ft was reduced to 4 59 which gives the expected 
distribution shown in Table II. 

Table II also shows the actual distribution of loss ratios experienced by one 
of the larger group insurance carriers under policies in this class. The Chi- 

TABLE II 


Experience under Group Sickness and Accident Insurance Policies 


Ratio of Losses to Premiums 

Number of Policies 

Expected 

Actual 

0 

18 

11 

.01- .09 

47 

37 

.10- .19 

53 

45 

.20- .29 


56 

.30- .39 

45 

38 

.40- .49 

41 

47 

.50- .59 

36 

39 

.60- .69 

32 

41 

.70- .79 

28 

37 

.80- .89 

24 

20 

.90- .99 

21 

29 

1.00-1.19 

32 

30 

1.20-1.39 

23 

22 

1.40-1.59 

17 

22 

1.60-1.99 

19 

14 

2.00 and over 

11 

9 


square test for goodness of fit gives, 

X 2 = 23, 14 degrees of freedom, 

which corresponds to a probability of 5 per cent. Thus it is apparent that 
theory and experience are in fair agreement considering that no allowance was 
made for the lack of homogeneity “between policies.” (This should not be 
confused with the homogeneity “within policies” covered in the theory.) 

If the expected number of events is small, especially if the weight distribution 
is irregular or discrete, it is sometimes advisable to use the following method. 

1. Use summation or approximate integration to obtain the distribution, 
f(y, n ), of the sum of n independent weights for n = 1, 2, 3, and 4. The 
formula is 




410 


F. E. SATTEHTHWAITE 


/( y, n + 1) * [ f(x)f(y - x, n) (lx. 

J 0 

2. Determine the generalized PnisMin distrilmticm for P, the expected number 
of events, equal to some smalt number, say J, The formula is 

Hu, P) = S PV 7( W) n). 



Fia. 1 Surgical Fee Insurance. —, Distribution,/({/, n), ot the sum of n independent 

claims. -Distribution, F(y , /-*), of the sum of the claims when P claims nro expected. 

The average claim ib 850, 

Exam-pie.' If the expected claims under a policy are 8100 (P *• 2) and if the actual claims 
arc $4fK), the probability of an experience as bad as this occurring because of chance factors 
is 0 1%, 


3. Use summation or approximate integration to obtain F(y, P) for P ®* 

1, 2, 4, • • • by the formula 

F(y, 2 P) « f F(x, P)F(y - x, P) dx. 

Jo 

4. If the calculations are carried on from both tails and if the results are 
plotted on probability graph paper, it is often possible to fill in the central sec- 







POISSON DISTEIBUTION 


417 


tions by interpolation. Such interpolations should be adjusted to reproduce the 
correct mean This method is illustrated in fig. 1 m the case of surgical fee 
insurance. 

8. Summary. In this paper the Poisson distribution is generalized to allow 
for the assignment of varying weights to events when the number of events 
follows the Poisson law. The ability of the Poisson distribution to handle 
heterogeneous data is preserved in the generalization. An example is given 
showing that the distribution of certain insurance statistics agrees with that pre¬ 
dicted by the theory. 



THE CONSTRUCTION OF ORTHOGONAL LATIN SQUARES 1 

By Henry B. Mann 2 
Columbia University 

A Latin square is an arrangement of m variables , .r 2 , • - • , x m into m rows 
and vi columns such that no row and no column contains any of the variables 
twice. Two Latin squares are called orthogonal if when one is superimposed 
upon the other every ordered pair of variables occurs once in the resulting 
square, 

The rows of a Latin square are permutations of the row xq , xi , * ■ , x m . Let 
P, be the permutation which transforms x L , it , • > • ,x m into the ith row of the 
Latin square. Then PJPJ 1 leaves no variable unchanged for i 9* j. For other¬ 
wise one column would contain a variable twice. On the other hand each set of 
m permutations Pi, Pt , • •« , P* such that P,P‘,‘ l leaves no variable unchanged 
generates a Latin square. We may therefore identify every Latin square with 
a set, of m permutations (Pi, Pa, ■ • • , Pm) such that P,PJ l leaves no variable 
unchanged. 

Now let (Pi, Pi, *• • , P m ), (Qi, Qi, • ■ ■ , Qm) be a pair of orthogonal 
Latin squares. We shall show that (Pi"‘Qi, Pi'Qi, ■ • ■ , Pm'Qm) is a Latin 
square. Pf'Q, is the transformation which transforms the fth row of 

(Pi, Pt, ■ • • , Pm) into the. ith row of (Qi, Qi, ■ • • , Q m ). Since every pair of 

variables occurs exactly once if the second square is imposed upon the first, 
the square (P7 l Q 1 , P~i l Qi , • • • , P^Qm) contains for every i and k a permuta¬ 
tion which transforms .r, into Xk , But then it can not contain two permuta¬ 
tions which transform x< into x k . This argument can he reversed and it follows 
that (Pi, Pj, ■ • • , P m ) and (Qi, Qi, - - ■ , Q m ) are orthogonal if and only if 

(Pr'Qi, P7 l Qt , ■ ■ ■ , PmQm) is a Latin square. 

Denote now by an to sided square S any set of to permutations 
(S i, ri'a, • ■ • , S m ) and by the product SS' of two squares S and S the square 
(NuSl, SiS[, ■ , RmS'm), Then we can state: Two Latin squares L\ and In 

are orthogonal if and only if there exists a Latin square Ln such that 

(1) LiLu — Li , 

Now let Li , Li, ■ • * , L r be a set of r mutually orthogonal Latin squares. 
Then we must have L<L lk — L k where L,k is a Latin square if i ^ k, Hence we 
have the theorem 

Theorem 1: The Latin squares Li, L», • • • , L r are orthogonal if and only 
if there exist r(r — 1) Latin squares L<k(i ^ k) such that L<L,k = Za . 

Corollary: If L\ L k and L'~ k are Latin squares then V is orthogonal to L k , 

For instance if L and L 2 are Latin squares then L is orthogonal to L 2 . 

1 Presented to the Mathematical Society October 31st, 1942, After I submitted this 
paper for publication Dr, Edward Fleishcr sent me his thesis on Eulorian squares which 
he submitted in 1934 and in which he proved Theorem 3 in a different manner. 

1 Research under a grant in aid of the Carnegie Corporation of New York. 

418 



ORTHOGONAL LATIN SQUARES 


419 


If A — (Ai, Ai , • ■ ■ , A m ) and P is any permutation then we put PA = 
(PAi, PA?., ■ ■ ■ , PA m ) and AP = ( A\P, AiP, ■ ■ , A m P). If A is a Latin 
square then also AP and PA are Latin squares. If A is orthogonal to B then 
AP is ^orthogonal to BQ for any permutations P and Q For if AC = B then 
AP(P CQ) = BQ, since the associative law holds for the operations indicated. 
This means that A and B remain orthogonal if we permute the variables in 
both squares in any arbitrary way. 

Hence if A is orthogonal to B also AAr 1 is orthogonal to BBT 1 . We can 
therefore, while preserving orthogonality, always transform the pair A and B 
so that Ai = Bi = 1 whore 1 denotes the identity. We shall then say that 
the pair A, B is written in the reduced form. 

Definition 1. If A is orthogonal to B, and if in the reduced form the permuta¬ 
tions of A are the same as those of B m a different order, and if these permutations 
form a group G, then the pair A and B is said to be based on the group G. 

A pair of orthogonal Latin squares is called a Graeco-Latin square. The 
Graeco-Latin squares constructed by Bose [1] Stevens [2] and Fisher and Yates 
[3] are all based on groups. There exist Graeco-Latm squares, however, which 
are not based on a group. 

If the orthogonal pair A, B is based on a group G and if AC = B then also C 
contains only permutations of G, and since C is a Latin square it must contain 
all the permutations of G. Calling C, the image of A, we obtain a biunique 
mapping S of G into itself. Let Af = C, then B, = A,A? and S has therefore 
the property that every element of G is of the form XX s where X is in G. 

Definition 2; A biunique mapping S of a group G into itself will be called 
a complete mapping if every element of G can be represented m the form XX s where 
X is an element of G and X s the image of X under the mapping S. 

If an abstract group G of order m admits a complete mapping S then we can 
immediately construct an m sided Graeco-Latin square based on G. To do this 
wc represent G as a regular permutation group. Let Pi, Pi, ■ ■ , P m be the 
permutations of this representation. Then A = (Pi, Pi , • • • , P m ), C = 
(P? , P ?, ■■■ , Pi) and B = (PiP?, PiPi , • • ■ , PmP'i) are Latin squares and 
hence A is orthogonal to B and APT 1 and B(PiPf) -1 form a reduced pair. 

If Li , Li, ■ ■ ■ , L t are orthogonal Latin squares and L.L.jt = Lk then we 
form the product 

(2) LiLiiLn • • • L r _i r . 

From L,L,k = Lb , LiLk; = L ; we find LJAi-L^ = Lj and hence L.kLi, = L t) . 
L, k is therefore orthogonal to L u . The product (2) has the property that for 
any s ^ r the product of s successive factors is a Latin square On the other 
hand if a product of r Latin squares L\, Lm , • ■ - , h T -\ T has this property then 
the Latin squares B\, Li, ■ ■ , L r where L, = GfLALis ■ ■ ■ L\— n are orthogonal. 

Definition 3: A set of r orthogonal Latin squares will be called based on a 
group G if every pair in the set is based on G. 

If Li, Li , • • • , L r are based on a group G then G must admit r mappings 
Si = 1, St Sr into itself such that every element of G can be written in 



■120 


HENRY B. MANN 


the form A"'' > ‘'' ' 4 ‘' 5 ‘ +A for every i and h with 1 ^ i ^ r and 0 ^ h ^ r — i, 

where A sh!< ’ — dAd*', and A s is the image of A under the mapping S. 

Definition 4: The mappings Si = 1, Sx, •• • , S r o/ a proup (7 into ifaef/ 
will hr called r-fold complete if every rlrment of G is of the form A r ‘'* ,+1 '' ,+l+ " 4s * + ‘ 
for every t and h with l ^ i ^ r and 0 ^ h ^ r — t. 

Now let (7 be an abstract group of order m admitting an r-fold complete set 
of mappings Si = 1, Hi , ■ • • , .S' r . Put 

j j( i ..+«« *• ... jp’i *- s i <- 


where 1, , • * • , P„ is a regular representation of <?. Then L\ , Lx , > • • , Lr 

is a set of r orthogonal Latin squares based on G. Put A,- = +»*+•.•+«. y ieTi 

LnA~ l , • • • , L r A 7 1 are written in the reduced form. Hence we have 
Theorem 2: A set of r orthogonal Latin squares based on a group G exists if 
and only if G admits an r-fold complete set of mappings. 

If G is of order m = 4a 4- 2 = 2m' then G has a self-conjugate subgroup 
H of order m'. Suppose G admits a complete mapping S. We have 


(? = // + HA. 

XX s C H if either X and X s or neither of them are in H. Further XX n C HA 
if either X or X s but not both of them are in II. 

Let a be the number of elements X C H such that X H C II, 
b the number of elements X C H such that X s C IIA, 
c the number of elements X C HA such that X 3 C II, 
then a + b^m^a + c- m'. Of the products XX s exactly f> + r. are in IIA. 
Hence b + c = ml, a = b and therefore m' — 2a, which is impossible since m! 
is odd. We have therefore: 

Theorem 3: No in + 2 — sided Graeco-Latin square based on a group can 
exist. 

If a group G admits r automorphisms Ti ~ 1, Ti, ■ • • , T, such that X Ti r* 
X T> for x 7 * j and X ^ 1 then the mappings Si = 1, Si = X _T,_l X T ‘ for i = 
2, 3, • • • , r are r-fold complete; for if 


J£3{+S{+i +...+Si+A yS» +Sf + 1 +A 


we have for i = 1 
and for i > 1 

and therefore 


j^r,+» _ yr(+* 


jr— Ti-ijj-Tt+h y~r(-iyr(+» 

(rry - 1 « (yx* l ) ri+A 


and hence Y = X in both cases since by hypothesis X T ' ^ X T< for i ^ j and 
X 1, X Si+ ‘ + ' S,+A therefore takes m different values and reproduces every 
element of G. 

If we represent G as a regular permutation group then the squares Li = 
(1, Pi, • ■ • , Pm), U = (1, Pi',--- , Pi'), - ,L = (1, Pl r , • ■ ■ , Pl r ) are 
orthogonal Latin squares by Theorems 1 and 2. There exist however complete 



ORTHOGONAL LATIN SQUARES 


421 


mappings which are not derivable from automorphisms. For instance every 
group of odd order admits the complete mapping A s = A but A T = A 2 is not 
an automorphism if the group is not abelian 

Most of the sets of orthogonal Latin squares that have been constructed 
so far are based on abelian groups of type (p, p, ■ ■ • , V) a nd the mappings of 
the squares of the sets into each other are automorphisms of this group It. C. 
Bose [1] and W, L. Stevens [2] for instance use the cyclic group of automorphisms 
of the additive group of a G. F. (p 71 ) induced through multiplication by the 
elements of the Galois field that are different from 0 In this way they assure 
that different automorphisms wall map the same element into different elements. 
They give a convenient method for finding a base element of the group of auto¬ 
morphisms. In this way they reduce considerably the labor involved in the 
construction of p n — 1 orthogonal Latin squares of side p n . The 9x9 squares 
in the statistical tables by Fisher and Yates [3] are also based on the abelian 
group of type (3,3) but another set of automorphisms is used. 

If in = pi 1 p' 2 1 ■ ■ ■ p‘n (p. prime p, ^ ' t .- for i k) and if r = min p\' — 1 
then a set of r orthogonal Latin squares can always be constructed from the 
abelian group of type (pi • • ■ pi, ps • ■ pi, • ■ • , Pn , • ■ • , Pn) and its auto¬ 
morphisms, This can be done by finding r automorphisms T['\ tA, ■ , Tf 
for each of the subgroups of order pT such that TAtA 1 leaves no element un¬ 
changed except 1. If we apply the automorphisms T, w \ T?\ • * •, T, tn) simul¬ 
taneously, for j — 1, 2, • • •, r, we obtain r automorphisms of the desired type. 

Once the automorphisms are known the construction of the set of orthogonal 
Latin squares can easily be carried out. To do this we have to write down the 
multiplication table of the group and obtain the orthogonal squares by inter¬ 
changing the row's in accord with the automorphisms 

Definition 5: A set of orthogonal Latin squares derived from a group and its 
automorphisms will he called constructed by the automorphism method. 

We now prove: 

Theorem 4. Let c q be the number of classes of elements of order q of a group G. 
Let s = min c, ; then not more than s orthogonal Latin squares can be constructed 
from G by the automorphism method. 

Proof: Let T be an automorphism which leaves no element unchanged 
except 1. If A is of order q then A 7 " is also of order q If A = P AP then 
there exists an element Q such that P = Q 1 Q 1 because, as we have shown, every 
element can be represented in the form X~ 1 X T . But then 

(QAQ- Y = QPP~ l APP~ l Q = QAQ~\ 

Hence A = 1. T can therefore not transform any element except 1 into an 
element of the same class. Hence not more than s = min c q automorphisms, 
Ti, • • • , T, can exist such that Tf 1 Tj leave no element except 1 fixed and this 
proves our theorem 

Corollary. Ifm = pVpl* ■ • • ptT(p» prime p, ^ pifor j ^ k) then not more 
than r = min pl l — 1 orthogonal m-sided Latin squares can be constructed from 
any group with the automorphism method. 



422 


HENRY B. MANN 


Proof: The Sylow group of order p',' contains a representative of every class 
of elements of order p, hence min c„ < min — 1, 

Below are given two examples of Graeco-Latin squares obtained from com¬ 
plete, mappings which are not obtained from automorphisms. Neither could 
have been obtained by combining Graeco-Latin squares constructed by the 
method of Bose [1] and Stevens [2], 

The first example is based on the abelian group of type (2,2,3). If the basis 
elements are defined by P 1 — R 1 = Q 3 = 1 the complete mapping used is given by 

h = (l, P, R, PR , Q, PQ, RQ, PRQ, Q\ PQ\ RQ\ PRQ 2 ) 

Lu = (1, RQ, PRQ 1 , PQ\ Q, RQ\ PR, P, Q\ R , PRQ, PQ) 

Li = (1, PPQ, P^, RQ 1 , Q\ PR, PQ, RQ, Q, PRQ 1 , P, R). 

The second square is baaed on the regular representation of the di the alter¬ 
nating group in 4 variables. The generating relations are P 1 = R l — Q 3 = 1, 
QP = RQ, QR = PRQ, The complete mapping is given by 

Li = (1 ,P,R, PR, Q, PQ, RQ, PRQ, Q\ PQ 1 , RQ*, PRQ 2 ) 
hi = (1, R, PR, P, Q, PQ, RQ, PRQ, Q\ PQ\ RQ*, PRQ*) 
h = (l, PR, P, R, Q 1 , PRQ\ PQ 1 , RQ\ Q, RQ, PRQ, PQ). 


EXAMPLE 1 


1,1 

2,2 

3,3 

4,4 

6,5 

6,6 

7,7 

8,8 

9,9 

10,10 

11,11 

12,12 

2,8 

1,7 

4,0 

3,6 

6,12 

6,11 

8,10 

7,9 

10,4 

9,3 

12,2 

11,1 

3,10 

4,9 

1,12 

2,11 

7,2 

8,1 

6,4 

6,3 

11,6 

12,6 

0,8 

10,7 

4,11 

3,12 

2,0 

1,10 

8,3 

7,4 

6,1 

5,2 

12,7 

11,8 

10,5 

9,6 

5,9 

6,10 

7,11 

8,12 

9,1 

10,2 

11,3 

12,4 

1,5 

2,6 

3,7 

4,8 

6,4 

6,3 

8,2 

7,1 

10,8 

9,7 

12,0 

11,5 

2,12 

1,11 

4,10 

3,0 

7,6 

8,6 

6,8 

6,7 

11,10 

12,9 

9,12 

10,11 

3,2 

4,1 

1,4 

2,3 

8,7 

7,8 

6,6 

6,6 

12,11 

11,12 

10,9 

9,10 

4,3 

3,4 

2,1 

1,2 

9,6 

10,6 

11,7 

12,8 

1,9 

2,10 

3,11 

4,12 

5,1 

6,2 

7,3 

8,4 

10,12 

0,11 

12,10 

11,9 

2,4 

1,3 

4,2 

3,1 

6,8 

5,7 

8,6 

7,5 

11,2 

12,1 

9,4 

10,3 

3,6 

4,5 

1,8 

2,7 

7,10 

8,9 

5,12 

6,11 

12,3 

11,4 

10,1 

9,2 

4,7 

3,8 

2,5 

1,6 

8,11 

7,12 

6,9 

5,10 


EXAMPLE 2 


1,1 

2,2 

3,3 

4,4 

6,6 

0,6 

7,7 

8,8 

9,9 

10,10 

11,11 

12,12 

2,4 

1,3 

4,2 

3,1 

6,8 

5,7 

8,6 

7,5 

10,12 

9,11 

12,10 

11,9 

3,2 

4,1 

1,4 

2,3 

7,6 

8,5 

5,8 

6,7 

11,10 

12,0 

9,12 

10,11 

4,3 

3,4 

2,1 

1,2 

8,7 

7,8 

6,5 

5,6 

12,11 

11,12 

10,9 

9,10 

5,9 

7,12 

8,10 

6,11 

0,1 

11,4 

12,2 

10,3 

1,5 

3,8 

4,6 

2,7 

6,12 

8,9 

7,11 

5,10 

10,4 

12,1 

11,3 

9,2 

2,8 

4,5 

3,7 

1,6 

7,10 

5,11 

6,9 

8,12 

11,2 

9,3 

10,1 

12,4 

3,6 

1,7 

2,5 

4,8 

8,11 

0,10 

5,12 

7,9 

12,3 

10,2 

9,4 

11,1 

4,7 

2,6 

1,8 

3,6 

9,6 

12,7 

10,8 

11,6 

1,9 

4,11 

2,12 

3,10 

5,1 

8,3 

6,4 

7,2 

10,7 

11,6 

9,6 

12,8 

2,11 

3,9 

1,10 

4,12 

6,3 

7,1 

6,2 

8,4 

11,8 

10,6 

12,6 

9,7 

3,12 

2,10 

4,9 

1,11 

7,4 

6,2 

8,1 

5,3 

12,6 

9,8 

11,7 

10,5 

4,10 

1,12 

3,11 

2,0 

8,2 

5,4 

7,3 

6,1 



ORTHOGONAL LATIN SQUARES 


423 


REFERENCES 

[1] R. C. Bose, “On the application of the properties of Galois fields to the problem of 

construction of Hyper-Graeco-Latin-squares,” Sankhya, 1938 

[2] W L. Stevens, “The completely orthogonalized Latm-aquare,” Annals of Eugenics, 

1939. 

[3] R. A. Fisher and F Yates, Statistical Tables for Agricultural, Biological, and Medical 

Research, Edinburgh. Oliver and Boyd 



A METHOD OF DETERMINING EXPLICITLY THE COEFFICIENTS 
OF THE CHARACTERISTIC EQUATION 

By P, A. SamueI/Kox 
Massachusetts Institute of Technology 

1. Introduction. When an investigator is interested in all of the latent roots 
of the characteristic equation of a matrix and not in its latent vectors, it is 
sometimes desirable to expand out the delerminental equation in order to de¬ 
termine explicitly the polynomial roeflicionts (pi ,jh , ■ • • , p„) in the expression 

(1) D(\) = | XI — a | = X" + piX" 1 4" ■ * • 4- Pn-iX 4- p n > 

This can be done in a variety of ways, all of which arc necessarily somewhat 
tedious for high order matrices. Except for sign the coefficients are respectively 
the sum of o’s principal minors of a given order. These can be computed 
efficiently by "pivotal” methods [1], Alternatively through the utilization of 
the Cayley-Hamilton theorem, whereby a matrix satisfies its own characteristic 
equation, the p’s appear as the solution of n linear equations [2, 3]. In a third 
method Horst has employed Newton's formula concerning the powers of roots 
to derive, the p’s as the solution of a triangular set of equations, the coefficients 
of the latter only being attained after considerable matrix multiplication [4]. 
A fourth method suggosUxl to me by Professor K. Bright Wilson, Jr. of Harvard 
University, consists of evaluating I)(X) for n values of X, presumably by efficient 
"Doolittle” methods; to these n points, Lagrange’s interpolation formula is 
appLiecl to determine the n coefficients explicitly. 


2. The New Method. The present paper describes a new computational 
method based upon well-known dynamical considerations. A single nth order 
differential equation can be converted into "normal” form, involving n first order 
differential equations. This is easily done by defining appropriate new variables. 
If the original nth order differential equation is written as 

(2) X (n) (t) 4- piX (n “ 1) (0 4- • • • 4- Pn-uV(t) 4- p n = 0, 

then the new normal system, can be written as 


(3) 


Xi(t) = (i = 1, • • • n) 

X 


where 


(4) 


[M 


01 0 ••• 0 

00 1 ... 0 


00 0 1 

-Pn ~p n -1 — Pn—3 “Pi 


is the so-called companion matrix to the polynomial in question. 

424 




CHARACTERISTIC EQUATION 


425 


he revei se process of going from a normal system in many vanables to a 
single high order equation is not so simple. Yet it can be done, and in so doing 
we attain the required polynomial coefficients [5], If 

(5) x'(t) = ax(t ) 

represents the normal system in matrix form, then symbolically 

(O d(~) xm - x!">(0 + „,*;•-»(<) + .. - + r^x'M + . 

Because we wish to find out the expanded form of D(X), this relationship is of 
no use to us. Since similar matrices have the same characteristic equation, 
ours is the problem of finding a non-singular matrix C, such that 

(7) CT'aC = b, 

where b is of the form given in equation (4). 

This problem can be approached from an elementary algebraic viewpoint. 
The relationships in (5) represent n linear equations between 2n variables, 
[*i(0. *2(0, ' ‘, *n(0, Xi(0, * 2 ( 0 , • • , *»(()]• These are not sufficient to 
eliminate the 2 (n — 1) variables not involving the subscript 1. However, inas¬ 
much as (5) holds for all values of t we may differentiate it repeatedly until we 
finally have the system of equations 

~X{ n) + ■ • ■ + a ll X i r i) + • • • + oi„ Y'’ 1 - 15 = 0 


~X[ n) + OnJ}" 11 + • • ■ + a„ n X i n~ l) = 0 

- X[ n ~ x) + + an Ai” 2) + • • • + ai n Xi , ‘ _2) = 0 

( 8 ) . 

+ a nl Xl n - 2) + • • ■ + a„ n Xl n - 2) = 0 

~X\ + ' • - + Ull*! + ■ • • + Uln*n = 0 


— X n + Onl *1 + ' ' 1 + a nn X n = 0 

These are n 2 linear equations in n + n variables. We wish to eliminate all 
variables which have a subscript other than one'; namely, (Xt, ■ ■ ■ , X n , 
x't , < ■ • , x'„ , • • ■ , xi n \ ■ ■ • x!, n) ). These are (n + l)(n - 1) = n 2 - 1 in 
number. We may utilize all but one of the n 2 equations to perform this elimina¬ 
tion. The remaining equation after substitution will be the desired high order 
equation, and its coefficients are the polynomial coefficients. 

Ordinarily one would solve all but one of the equations for the values of the 
variables to be eliminated. These would then be substituted into the remaining 
equation. Actually from the computational standpoint it is unnecessary to 
solve completely for any unknowns. The so-called “forward 1 ' solution of the 
usual Gauss-Doolittle technique automatically performs the elimination or 






426 


F. A. SAMUBOSON 


substitution, without necessary recourse to a “back” solution for the values of 
the eliminated variables. These values are in any ease of no interest 

There is no unique order in which the equations must be reduced. Indeed, 
when one order fails because a leading principal minor vanishes, we may switch 
to another. A suggested convenient order is given below. Let 


an j 

an • 

■ ■ Ol«~ 

Oil I 

an • 

■ ■ a !n 

i 

_0„ 1 1 

a„2' 

1 a„ n _ 


an 

X 


It ' 

M 


I - (O : 


(Li - l, 


n 


1) 


Then, consider the partitioned matrix: 


(9) W = 


-J 

M 

0 

0 

0 

0 

— lS 

0 

0 

0 

-I 

M • • - 

0 

0 

0 

0 

- fi ... 

0 

0 

0 

0 •• • 

-I 

M 

0 

0 

0 

-s 

0 

0 

0 . 

0 

R 

0 

0 

0 

-a u 

0 

0 

0 ... 

R 

0 

0 

0 

0 

0 

0 

0 

R ■ • 

0 

0 

0 

1 

— On • ■ ‘ 

0 

0 

R 

0 ■ 

0 

0 

1 

— an 

0 

0 


It is simply the matrix of the equations in (8) with the variables 
(Xi , Xi, • • • , -Xi"’) shifted over to the right-hand side, and with the equations 
in which the variable one leads off being placed at the bottom. 

If the usual “forward” Doolittle technique is followed, then the final elements 
computed, corresponding to the elements in the lower right-hand box, are the 
coefficients (1, pi, -p%, • ■ • , p„). It is the present writer’s experience that the 
Grout form [6], like Dwyer’s [7] the last word in Doolittle abbreviation, is to be 
recommended, particularly since we are dealing with an asymmetrical matrix. 
A clerk masters its ritual in a few minutes, and the speeds achieved once the 
operations become mechanical are impressive. 

Tor the trivial case of determining the coefficients corresponding to a two by 
two matrix the W matrix is of the form 


( 10 ) 


-1 

On 

0 

0 

— 021 

0 

0 

-1 

0.21 

0 

0 

— 021 

0 

0 

On 

0 

1 

— Oil 

0 

OlS 

0 

I 

-o u 



The Auxiliary Crout matrix becomes 


(U) 


-1 

O 21 

0 

0 

-a n 

0 

0 

-1 

022 

0 

0 

— 021 

0 

0 

Ol2 

0 

I 

— On 

0 

— Ol2 

022 

1 ■ 

( — Oil — 022 ) 

(—012021 T O 11 O 22 ) 










characteristic equation 


427 


The answer m the lower right-hand box will immediately be recognized as the 
correct one. I have found it convenient to vary the precise Crout routine by 
dividing vertical columns by the “leading” diagonal element, rather than 
honzontal columns This is a matter of indifference and saves some computa¬ 
tions. As in the higher order cases, the presence of the identity matrix along 
the diagonal reduces moat of the computations to mere copying. Actually the 
intelligent computer will soon notice that most of the copying may be eliminated 
since the numbers in question are to be added in later in other sums of products. 
After eliminating unknowns corresponding to the equations above the line on 
which (9) is written, there results the system 


~R 

0 

0 

0 

0 

■ 0 

0 

1 

— Oil 

RM 

0 

0 

0 

0 

■ 0 

1 

— On 

-RS 

RM 2 

0 

0 

0 

0 

• 1 

—On 

-RS 

-RMS 

_ RM”- 1 

1 

-a n 

— RS 

—RMS ■ 

. 

.. . 

. . . 

-RM”-*S_ 


Thus, it would be simpler to start from this stage, avoiding unnecessary copying. 

This remark shows that the present method is related to the Cayley-Hamilton 
methods described in [2] and [3], since the above set is derivable from the set 


r f 
Cl 

A° 

1 

0 

0 •• 

■ 0“ 

t 

Cl 

A 1 

0 

1 

0 

. 0 

r 

ei 

A 1 

0 

0 

1 •• 

• 0 

/ 

- 

A" 

0 

0 

0 • 



The lost named set appears in the Cayley-Hamilton method when the first row 
of the powers of the original matrix are used in setting up n equations to deter¬ 
mine our n unknowns. Although related, the two methods are distinct since 
in the Cayley-Hamilton method one would arrive at a different set of equations 
after straightforward elimination of one variable, and since it would be shorter 
to dispense with the identity matrix used in the Aitken method in favor of the 
solution of a single set of equations by the usual Doolittle ‘‘back-solution. 

The reader will easily see how the method may be modified to handle the more 
general case of determining the coefficients of 

(14) D(\) = | cX + a | =0, 

where c and a are any matrices. The method also can be used to reduce a 
polynomial equation involving a determinant of the nth order, each of whose 
coefficients are of a given degree in X, to a lower order determinant whose coef¬ 
ficients are of higher degree in X. 



428 


I'. A. SAMUKLSOX 


The present method derives the p's as the algebraic solution of high order 
linear equations. It would therefore seem inferior (o those methods which need 
only solve a system of n equations. However, two remarks are in order. The 
matrix of the* high order system ran he written down immediately without 
computation. Furthermore, most of the elements in tin 1 matrix are zeros, so 
that a mere counting of the equations is not a true indication of the labor in¬ 
volved, 

3. Some comparsions between present method and other methods. Within 
the brief compass of the present work it is not possible to give an exhaustive 
appraisal of the comparative computational efficiencies of the methods men¬ 
tioned. In general, a computing method is to lie judged in terms of the number 
of multiplications that, it involves, although oilier considerations such as the 
number of additions, the magnitude and sign of the numbers handled, the 
repetitiveness of the operations involved, the adaptability to punch card ma¬ 
chinery, etc. are modifying factors. In this discussion the power of a method 
will be taken to be an inverse function of the number of multiplications that it 
involves. 

It may be said first of all that inasmuch as the minimum number of multi¬ 
plications involved m computing an nth order determinant, is of the order of 
n a , even with the most efficient “pivotal” methods, direct computation of the 
coefficients by principal minors involves, for sufficiently large n, computation 
of the order of n\ The same is true of the Wilson method described above. 
The Horst method, and any other that requires the explicit n powers of an nth 
order matrix, also asymptotically requires multiplications of the order of n\ 
This docs not mean that the above three methods are equally powerful for small 
n, nor even asymptotically, since the coefficients of the n K term in the formula 
for the requisite number of multiplications may not he equal. In fact, Riersol 
[1] has shown that his method is better than Horst’s for small n, but asympto¬ 
tically less powerful. 

It can also be, shown that the Cayley-Hamilton methods which simply involve 
products of the powers of a matrix with row or column vectors are asymptotically 
more powerful than any of the above methods, the work only increasing as the 
cube of n. This is true whether the longer Aitken form of reduction is em¬ 
ployed or whether the usual Doolittle back-solution is followed. The present 
method is also an efficient one in the sense that its requisite number of multi¬ 
plications increases with the cube of n. For small values of n and asymptotically 
it can be shown to be more powerful than the Cayley-Hamilton method which 
uses the Aitken method of reduction, although in the limit as n becomes large 
the ratio of the powers of the two methods approaches unity. 

It is of the greatest interest to compare the power of the new method with the 
shorter Doolittle C-H method. It can easily be shown that the coefficients of n 3 in 
the expressions giving the respective requisite number of multiplications differ 
in such a way as to make the C-H method more powerful after some value of n, 



CHARACTERISTIC EQUATION 


429 


the ratio of the respective powers approaching the limit 8/9 However, for 
low order matrices the new method is the more powerful The reader may 
easily verify this for the case of a second order matrix. Below a sixth order 
matrix the present method seems to involve the smaller number of multiplica¬ 
tions, Tor a sixth order matrix the two methods seem to involve the same num¬ 
ber of multiplications (multiplications by unity not being counted). Tor 
matrices of the seventh order or higher the C-H method seems to be optimal. 

As compared to an explicit evaluation of the coefficients by a straightforward 
computation of principal minors according to the fundamental definition of a 
determinant as the sum of signed products of elements, all of the methods 
discussed are efficient, since the work in the former increases faster than any 
power of n. However, for each of the methods discussed, in singular cases the 
method of reduction may fail so that modified procedures will be necessary. In 
actual practice such singularities will "almost never” be encountered But in 
the neighborhood of such singular points the computations become extremely 
sensitive to any rounding off of digits, Consequently, it is from the nature of 
the case impossible ever to develop exact rules for the maximum error involved 
in any given calculation, 

REFERENCES 

[1] 0. Riersol, “Recurrent Computations of all Principal Minors of a Determinant," 

Annals af Math. Slat,, Vol. 11 (1940), pp. 193-198. 

[2] R. A. Eraser, W. J. Duncan, and A, R, Collar, Element ary Matrices, pp. 141-142. ^ 

[3] M. M. Flood, "A Computational Procedure for the Method of Principal Components," 

Psychomelrika, Vol. 5 (1940), pp. 169-172. 

[4] P. Horst, “A Method for Determining the Coefficients of a Characteristic Equation," 

Annals of Math, Slat., Vol, 5 (1936), pp, 83-84. 

[6] F. R. Moulton, Differential Equations, pp, 6-9 

[6] P. D. Crodt, "A Short Method for Evaluating Determinants and Solving Systems of 

Linear Equations with Real or Complex Coefficients," American Institute of 
Electrical Engineers, Vol, 60 (1941), 

[7] P, 8. Dwyer, "The Solution of Simultaneous Equations,” Psychometrika , Vol. s (ianj, 

pp. 101-129 



NOTES 

This section is devoted to brief research and expository articles, notes on methodology 
and other short items. 


A NOTE ON THE THEORY OF MOMENT GENERATING FUNCTIONS 


By J, H. Curtiss 
Cornell University 


Let X be a one-dimensional variate and let F(x) be its distribution function. 1 
The function 


G(«) = E{e ttX ) = [** 
J—oo 


e az dF(x), 


a real, 


in which the integral is assumed to converge for a in some, neighborhood of the 
origin, is called the moment generating function of X , In dealing with certain 
distribution problems, this function has been widely used by statisticians, and 
especially by the English writers, in place of the closely-related characteristic 
function f(t) = E(c' ,x ). It is known that a characteristic function uniquely 
determines the corresponding distribution, and that if a sequence of character¬ 
istic functions approaches a limit, the corresponding sequence of distribution 
functions does likewise. (These results are. more accurately stated below.) The 
appropriate analogues for the moment generating function of these theorems are 
apparently not too readily accessible in the literature, if they have been treated 
at all, and it seems worthwhile to record them in this note. 

Henceforth we abbreviate distribution function to d.f., moment generating 
function to m.g.f., and characteristic function to c.f. The variables a and i will 
always be real, in contradistinction to the complex variable s, to be introduced 
in the next paragraph. 

The uniqueness property of the cf. may be stated as follows: If F\(x) and 
fi(t) are the di. and c.f. of one variate, and Fi{x) and fi(,t) are those of another, 
and if fi(l) = f%{t) for all 8 1, then F\(x) == Ft{x) for all x [1, p 28]. To study the 
corresponding situation for the m.g.f,, we first observe that 

vj(s) = E(e tX ) * / e'* dF(x), s complex, 


1 Or cumulative frequency function; our notation and terminology are uniform with 
that of [1] except for the use of the term “variate” instead of "random variable.” 

1 It is possible for two non-identical distributions to have c.f.’s which are identical 
throughout an interval of values of t containing the origin; an example is given in [4], p 100. 
The author iB obliged to Professor Wintner and Professor Feller for pointing out the exist¬ 
ence of this particular example 


430 



MOMENT GENESATING FUNCTIONS 


431 


is !i bilateral Laplace-Stieltjes transform. If such a transform exists for real 
values of s in an interval —a,.<s < «i, ax > 0, it must exist for all complex 
values of s in the strip — on < Sis < an, and represent there an analytic func¬ 
tion of s [5, p. 238], Evidently p(a) = G(a), <p(it) = /((). Suppose now that 
Fi(x), (?i (a), /i(0, are the d.f., m.g.f., and c.f. of a variate Xi, and F a (x), Gt ( a ), 
MO, are those of X,. Let w («) = E(e ,Xl ), MO = E(e‘ Xi ), s complex If 
Gi(a) ss for all a in some interval, however small, containing the origin, 

then by a familiar property of analytic functions [2, p. 116], MO = MO 
throughout the corresponding strip of analyticity, and so on the axis of imagi- 
naries. This means that/iOO = MO, all f, and therefore Fi(x) s F 2 (x). We 
have: 

Theorem 1. A m.g.f. existing m some neighborhood of a — 0 uniquely deter¬ 
mines the corresponding distribution. 

We turn now to distributions of variable form. Because certain of the ver¬ 
sions to be found in the literature are incomplete, it seems worth while to give 
here a full statement of the basic limit theorem for sequences of c.f.’s, due to 
P. L6vy and sometimes called Levy’s Continuity Theorem [4, pp. 48-50]. 

Theorem 2, Let the distribution of a variate X n depend on a parameter n, and 
lei l<\(x) and f n (t) be the d f. and c.f. of X n . 

(a) If there exists a variate X with d.f. F(x) such that lim„_, M F n (x) = F(x) at 
every continuity point of F(x), then lim„_ oe / n (t) = f{t) uniformly m each finite 
interval on the t-axis, where f(t) is the c.f. of X. 

(b) If there exists a function f(t) such that lim= fit), all t, 3 and uni¬ 
formly* in some open interval containing the origin, then there exists a variate X 
with d.f. F(x) such that lim„_«, F„(x) = Fix) at each continuity point and uniformly 
in any finite or infinite interval of continuity of F(x). The cf. of X is f(t), and 
lim n ^.«/n (0 — f(t) uniformly in each finite interval. 

We now develop the corresponding theorem for the m.g.f In the first place, 
it is not difficult to see that part (a) will have no direct analogue, even if we add 
to the hypothesis the conditions that the m.g.f. of X n exists in some fixed interval 
for all n and that the m.g.f. of X also exists in some interval. For example, 
the d.f. 

{ 0, x < —n 

J + k„ arc tan nx, —n £ x < n 

1 ,x^n 


1 The condition that exiat on at leaBt an everywhere dense set of points on the 

f-axia is essential to the proof aa given in Cramer’s book [1, pp. 29-30], but is omitted in his 
statement of the theorem, and is not stated clearly in certain other treatments by other 


authors. , 

4 For a discussion of this uniformity condition, and possible alternatives, see 11, p. 29 
(footnote)]. The condition, may, for instance, be replaced by the assumption that f(t) 
is continuous at t = 0. 



432 


J. H. CURTISS 


whore k„ 


1/(2 arc tan a 2 ), clearly tends as n —> «> to the d.f. 


F(x) 


0,x <0 
1, x ^ 0 


at all points of continuity of the latter d.f. The m.g.f. corresponding to 
F n (x) is 

Gn(a) = f" k„e ax - dx, 

J-n 1 + n-x 2 


which for each n exists for all <x, and the m g.f. corresponding to F(x) is simply 
the constant 1. Clearly 



n 

1 4- tFx 2 


dx, 


and from this it can easily be verified that limn-,,, G„(a) = <*■ If a. 0. In 
short, mere convergence of a sequence of d.f.'s tells little about the behavior of 
the corresponding sequence of m.g.f.'s. 

Part (b) assumes the following form: 

Theorem: 3. Let F n (x) and G n («) be respectively the d.f. and m g.f. of a vari¬ 
ate X„. If (?„( a) exists /or | a | < ca and for all n n 0 , and if there, exists a 
finite-valued function G{a) defined for | a j g ai < ai, as > 0, such that limn-* 
G„(a) *= (7(a), | « | g aa, then there exists a variate X with d.f. F(x) such that 
limn-* F n (x) — F(x) at each continuity point and uniformly in each finite or 
infinite interval of continuity of F{x). The m.g.f. of X exists for | a | g a 2 and 
is equal to G(a) in that interval. 

To prove the theorem, we introduce the Laplace transform = E(c‘ x ") 
and observe that | <p„(s) ) g v>n(«) = <7„(a), s = « -f it, n ££ Ro, for any s in 
the strip — ai < 91s < m . By applying Leibniz’s rule for differentiation under 
an integral sign (extended to Stieltjes integrals), we find [5, p. 240] that 

G«(«) - f + “xV I dF„(x), |a| <«i, 

from which it appears that G„(oi) > 0, | a | < ai. This means that the function 
G „(a) assumes its maximum value in the interval | « | g « s at either or both 
endpoints of the interval. But of course G„(a 3 ) and (?„( —a 2 ) both approach 
finite limits as n becomes infinite, so it follows that the sequence (£?„(<*)}, 
» g wj, is uniformly bounded in the interval | a [ g a 2 . Thus the sequence 
{I v’n(s) I), fl S n D , is uniformly bounded in the strip — a 2 g fRs g a 2 , and 
moreover has a limit at each point of an infinite set possessing a limit point in 
the strip (i.e,, at each point of the interval — a 2 g s g a 5 ), So by Vitali’s 
Theorem [3, pp. 156-160, 240], there exists an analytic function <p*(s) such that 
if n (s) * ^*( s ) uniformly in each bounded closed subregion of the strip 
—a 2 < < « 2 . Since <s„{it) is the e.f of X n , the existence of the limiting 

distribution follows from Theorem 2(b). 



MOMENT GENERATING FUNCTIONS 


433 


Of course, V*(«) — 0(a), —at < a < ce 2 . It remains to show that <p*(a) 
is the m.g.f. of X, Theorem 2(b) states that <p*{it) is the c.f. of X. If we can 
show that, the function v(«) = E(e tX ) exists at least in the strip — a 2 < Sis < a 2 , 
then since vrf«) ra &*(«) on the axis of lmaginaries, the equality must be valid 
in the entire atrip, and ho in particular on the interval of the real axis inside 
the atrip. 

It will suffice for this purpose to show that <p(a) exists for — a 2 g a a 2 . 
Suppose indeed that <p(a) docs not exist at some point a = a 3 in this interval. 
That means that if 

M ~ ll.u.b. G n (a»), n ^ n 0 ], 
wc can find a real number A such that 

(!) e a,x dF(x) > M . 


But 

J e"** dF(x) ® J c“** dF n (x ) + e a>z dF(x) — £ ^ e 11 dF„(x)^J. 

Since lim;-« Fj.r) ~ Fix) at all continuity points of F(x), and 30 on an every¬ 
where dense set of points, the Hclly-Bray Theorem [5, p. 31] states that the 
expression in brackets in (2) approaches zero as n becomes infinite. Meanwhile 

£* e"** dF n (x) £ £*" e a,x dF n {x) £ M, n £ no. 

Tliu« we, arrive at the conclusion that the left member of (2) must be less than 

or equal to M, which contradict* (1). , nJ N 

To he sure, we have only proved that the m.g.f. of X is equal to <p (a) or (j( a) 
in the open interval < a < or., and not in the corresponding closed interval, 
as promised, But because of the absolute (and therefore uniform) convergence 
of the integrals defining (?„(«) and *(«), these functions must be continuous in 

the dosed interval —at 2s « S * Since , y ,t 

this interval, (Ha) must also be continuous there. This implies that ¥>(“)> the 
m.g.f. of X, is identically equal to <7(«) in the closed interval, and the proof is 

“lU^rhap, worth while to point out explicitly that in the courae oi the 

foregoing argument we have proved this proposition, . . 

Tteioiuw 4.1/0 HtMM «/ *» «« cp ™IfX?J°C72 

„ « 0, Um it mi M anmgt mijomlv *n mry d°tad mbinlerml o/ He open 

interval, atul the limit function is itself a m.gf. 

references 

ill H. Cramer, Random Variable, and Probability ZM.lntul^. Cambridge, 1937. 

121 D, R. Curtiss, Analytic Function, of a Complex Variable, Chicago, 9 . 

[3] P. Dienes, The Taylor Series, Oxford, 1 ^!. . lg37 

{4] P. L£vt, Thborie de V Addition fa Variable, AUatoir*»,T*n,, 

[51 D. V. WrDDER, The Laplace Tran, form, Prinoeton, 19 1 . 



434 


ABRAHAM WALD 


ON THE POWER FUNCTION OF THE ANALYSIS OF VARIANCE TEST 1 

By Abraham Wald 
Columbia University 


It is known 2 that the general problem of the analysis of variance, can be re¬ 
duced by an orthogonal transformation to the following canonical form: Let the 
variates Vi, •• • ,y p , Zi, ■ ■ - , z n be independently and normally distributed 
with a common unknown variance <r 5 . The mean values of z x , ■ • • , z„ are known 
to be zero, and the mean values 171 , • • - , of the variates Vi, • • • ,y p are 
unknown. The canonical form of the analysis of variance test is the test of the 
hypothesis that 


(1) 


JJl = 7)2 = • • • = 7)r = 0 


C r < y) 


where a single observation is made on each of the variates 7 / 1 , * • • , y p , 

*1 , * * * 1 2 n . 

In the theory of the analysis of variance the test of the hypothesis (1) is 
based on the critical region 


( 2 ) 


yl 

4 + 


+J/r > 
+ z n 


where the constant c is chosen so that the size of the critical region is equal to 
the level of significance a we wish to have. The critical region ( 2 ) is identical 
with the critical region 


(3) 


yl 


+ 


+ y 2 r 


l I 

y 1 + 


+ yl + si + 


+ z 2 „ 


> c' = 


c 

e + 1’ 


It is known that the power function of the critical region (3) depends only on 
the single parameter 

(4) X = £ vl ■ 

0 3 i-1 

Denote the power function of the critical region (3) by /3 0 (X). P. L. Hsu has 
proved 1 the following optimum property of the region (3): Let W be a critical 
region which satisfies the following two conditions : 

(a) The size of W is equal to the size of the region (3). 


1 Presented at a joint meeting of the Institute of Mathematical Statistics and the Ameri¬ 
can Mathematical Society in New York, December, 1941, 

? See for instance P, C. Tang, “The power function of the analysis of variance tests,” 
Stat. Res, Mem., Vol. 2, 1938. 

J P. L. Hsu, "Analysis of variance from the power function standpoint,” Biomelnka, 
January, 1941. 



ANALYSIS OP VARIANCE 


435 


O') The power function of W depends on the single parameter 
Then £(X) < WA) where /3(X) denotes the power function of W. 

Condition 00 is a serious restriction in Hsu’s result. In this paper we shall 
prove an optimum property of $o(X) where /3o(X) is compared with the power 
function of any other critical region of size equal to that of (3). 

For any given values m, i, • ■ • , t/„ , A and X denote by S( V ' T+1 , • ■ • , v ’ r , A, X) 
the sphere defined by the equations 


(5) Vi + • - ■ + jj? * X</ 2 ; ij, = = r+1, ,p)\ a = a’. 


I 1 or any region IF denote by PAvi , ■ • ■ , i) p , a) the power function of W, i.e. 
PAvi . ■ • • , Vp i o') denotes the probability that the sample point will fall 
within TF calculated under the assumption that pi, ■ • • , Vp and a are the true 
values of the parameters. We. will denote by yAvi+i, • • ■ , V ' P , A, X) the in¬ 
tegral of , the power function /Mm ,v'p , A) over the surface 
$(v'r m , • * • , , cr', X) divided by the area of &4+i, ,t]' v , A, X), i.e. 


yt rbii 


•» 11 


X) 


(0) 


L 




47 


/Mm > 


Vp > 


r')dA. 


We will prove the following 

Theorem: If IF is a critical region of size equal to that of (3), i.e. 
/MO. • • ■ I 0, nr#I , * * • . Vp , <r) = /3n(0), then 


( 7 ) 


yAv'r 


r+1 . 


I Vp > a l X) 5; $)(X) 


for arbitrary values Mt, • • • , Vr , o' and X. 

If IF satisfies Hkvi’k condition (6) then the power function Mm. " 1 > m . O 
is constant on the surface S(i}r+ 1 , • • , vp > <r, X) and therefore 

yw(.vm , • • • . Vp , <f, X) = /Mm, • • ■ , Vp , A- Hence Hsu’s result is an imme¬ 
diate consequence of our Theorem. _ ___ 

Denote | Vyl + ■•'•> y\ + z? + - - ■ + z\ \ by t and for any values 
a r+ i, •• • , a„ ,b let R(a r h, * • • , a p , b) be the set of all sample points for which 


3 /i — afi — r + 1, • ■ • , p) and t = b. » 

For any region IF of the sample space we denote by W{y,+i, ■ ■ ■ , y P , t) the 
common part of IF and . * * • , Vp > 0* 

In order to prove our Theorem we first show the validity of the following 
Lemma 1: For any critical region Z there exists a function <pz(y?+i > • • ■ . Vp . _0 
of the variables y rii) ■ • - , y p , l such that the critical region Z* defined by the in¬ 
equality 

J/l y* > <pz(y r +l , • ’ • ) Vp > 0 


satisfies the following two conditions : 

(a) MO, • • * , 0, m + i, * • • , v P , A = MO, ■ • • , 0, m +1 , • ■ ■, v*, 



436 


ABRAHAM WALD 


(b) 7 z(tMi I • • ■ j 7) P , O', X) < yz.(j7r-ii , ■ ■ • , Vp , C, X). 

Proof. Denote by Pz(y T +i , - • ,y Pl t) the conditional probability of 
Z(y r+ 1 , • ■ • , 2 /p , t) calculated under the condition that the sample point lies 
in R{yr+i , • • • , y P , t) and under the, assumption that 171 = ■ • • = ij r = 0 
Denote by F(d, t) the conditional probability that 

yl + • • • + y\ > d 


calculated under the condition that the sample point lies in R(y r+ 1 , ■ ■ - , y P , t) 
and under the assumption that m = • • • = i) r = 0, It is easy to verify that the 
values of F(d , i ) and Pz(y r +i, • • • , |/ P , t) do not depend on the unknown 
parameters 17 , 41 , ••*,?? p , a-. Since F[d, l ) is a continuous function of d and 
since F(t a , t) - 0 , there exists a function <p z (y T 41 , • • • , y p , t) such that 


F [yzfj/r+l i ’ ' ‘ ) Up 1 0 > f] Pz(l)r+l i ‘ ‘ ' 1 Up 1 £). i 
For this function <pz{y r + 1 , • • • , y P , t) the region Z* certainly satisfies condition 
(a) of Lemma 1, We, will show that condition (b) is also satisfied. Consider 
the ratio 


I, 


( 8 ) 


. . .. 

f 

exp 


exp 


2 


___ 2 ff> 5 2 V S Z “] 


+ 22 (Vi — w) 2 + 

t—r+l 


Denote 


t r i-i 


~‘ X / 


by r v . Then we have 


(9) 


l 


X v>nh l 
f-i 




dA = / 






IS Vim!" 1 

e~ l dA. 


<lr+l 


dA, 


where a( 17 ) denotes the angle (0 < 0 ( 17 ) < it) between the vector y with the 
components yi , • ■ • , y T and the vector 17 with the components tji , • • • , Vr ■ 
Because of the symmetry of the sphere, the value of the right hand side of (9) 
is not changed if we substitute 78 ( 17 ) for a( 17 ) where /3(i)) denotes the 
angle (0 < 78 ( 17 ) < ?r) between the vector 17 and an arbitrarily chosen fixed vector 
u. Hence the value of the right hand Bide of (9) depends only on r v , be. 


f rv 008 (i) ]/» 

Now we will show that J(r„) is a monotonically increasing function of r v . We 
have 


l 


V X r M COt [/Hi )V r 


^ Cl r+l i" 


dA = I(r v ). 



ANALYSIS OF VARIANCE 


437 


( 11 ) 


dl(r y ) _ VX I" 


cos Ds( n )] e V " 


Ty Cos t/3(»j) ]/tf 


dA. 


Denote by wi the subset of ,S(tj r+1 , • • • , n „ , a, X) in which 0 < /3( v ) < - and by 

2 

oh the subset in which - < /3(y) < tt. Because of the symmetry of the sphere 
wc obviously have 


( 12 ) 


f con l/3(j7)]e v ' ) ' ry lm]h dA = [ cos W - P(r,)]e Vx r " 008 l *-W’ dA 

= - f cos [/ 3 (t?)]e -v/rr ' 0 O,(| 5 (,,>,/,r dA. 

J M, 


Hence 


(13) 


dl(r„) _ VX 


dr„ 


f eoa[fi(n)]ie^ /Xr ‘ iao ‘ m}]1 '’ - e -V*r»«>o» w«i/*j 

CT J w j 


The right hand side of (13) is positive. Hence I(r y ), and therefore also the left 
hand side of (8), is a monotomcally increasing function of r„ . 

Bet P\{y,.\\ t Ur , t', vi t)p , a) dy r +i • • • dy p dt be the probability 

that the sample point will fall in the intersection of Z and the set 

u\ — h dy, < Vi < Vi + i dy,(i - r + 1, • ■ ,p), f - £ dt < t < V + | dt 

Similarly let Pt(y ',+\, • ■ • , y' v , t ', yi, ■ ■ ■ , v P > °) dy r +1 • • dy v dt be the un¬ 
conditional probability that the sample point will fall in the intersection of Z* 
and the set 


y[ ~~ § dVi < yt < y< + l dyi(i = r + l, ■■■,?), t' - \ dt < t < t' + $ dt. 
Since the function <pz(y r +i , • ‘ • , y P , i) has been defined so that 
Pz(Vr +1 j * ‘ ’ > Up i 0 = PlviVr+l > ’ ' ‘ i Up i Oi d> 


we obviously have 


(14) 


PliVr+l > ' ‘ ‘ i Vv i 0| J bj l?r+l j * " ’ i Up > **) 

— Pl(Vr+l ) ‘ ’ ’ I Up l 0, • • ■ , 0, J)r+1 i "' ' > Vp > a ). 


Using a lemma 4 by Neyman and Pearson, we easily obtain 


(15) 


[ Pi(yr+i, ■ • ■ , Vp , l, m , '' • . V P , <r) dA 


>-!. 


S(ii r 4-|.V r,X> 


Pi(yp+i j ■ ■ ■ i Vp i ’ll i ■ * 1 » > <r ) dA 


<J. Neyman and E. S, Pearson, "Contributions to the theory of testing statistical 
hypotheses," Slat. Res Mem., Vol. 1, London, 1936. 



438 


ABRAHAM WALD 


from (14) and the fact that the left, hand side of ( 8 ) in a monotonically increasing 
function of r'l = y\ -1- ■ • * + y\ . Condition (b) is an immediate consequence 
of (15). Hence Lemma 1 is proved. 

For the pi oof of our theorem we will also need the following 

Lemma 2: Let ri, ■ ■ ■ , i'k he k normally and independently distributed variates 
with a common variance a’. Denote the mean value of i) { by a,(i = 1 , • • • , k) and 
let f(v i, • • • , y* , a) hr a function of the variables v\, • ■ ■ , v k and a which does not 
involve the mean values co , • ■ • , a k . Then, if the expected value of /(in , ■ * • , ti* , a) 
is equal to zero , f(vi , • • • , a* , a) is identically equal to zero, except perhaps on a set 
of measure zero. 

Proof; Lemma 2 is obviously proved for all values of cr if we prove it for 
<r = 1 . Hence wo will assume that <r — 1 . It is known that a fc-variate distri¬ 
bution which has moments equal to those of the joint distribution of tn , • • ■ , y*, 
must be identical with the joint distribution of in , ■ * • , o*. That is to say, the 
joint distribution of m■ , a* is uniquely determined by its moments. Hence if 


( 10 ) 






vl k g(vL, 


-4 2 (*<--««>* 

• • , Vide >“ l dvt dv h — 0 


for any set (n, • • • , r k ) of non-negative integers, then g(vi, • • , v k ) must be 
equal to zero except perhaps on a set of measure zero. Now lob/(in , • • ■ , vf) 
be a function whose expected value is zero, i.e. 

r"° r- ,w , , -l 2 (»(■«,)* 

(17) / ••• / /(et, • • • ,y*)c dvi ■ ■ • dv h » 0 

J-T w •/— CO 


identically in ai, From (17) it follows that 


(18) 


r + " r 2 »?+• 2 *v»< 

/ • * • I /(y i , • * • > v h )e <~ l •-« dvi ■ • • dv k » 0 

J — CO J — 00 


identically in an , • ■ • , a k . Differentiating the left hand side of (18) n times 
with respect to on, n times with respect to a a , • • • , and r* times with respect to 
, we obtain 


(19) 


J—oo J—QO 


yl l 


v?f(vi, 


Vk)e 




dyi • ■ • dvi, = 0 . 


From (16) and (19) it follows that f(vi, • • • , vf) = 0 . Hence Lemma 2 is 
proved. 

Using Lemmas 1 and 2 we can easily prove our theorem. Because of Lemma 1 
we can restrict ourselves to critical regions W which are given by an inequality 
of the following type 

yl +•■•+!/* > <p(y r + 1 , ■ ■ ■ ,Vp, t) 

where y>{y r + 1 , • • • , V P , t) is some function of y t +i , ■ ■ ■ , y v and t. The above 
inequality can be written as 



ANALYSIS OE VARIANCE 


439 


( 20 ) 


2 I 

2/i + 


+ Vr 


> MVr. 


+1 ? 


> yp > /)■ 


For any given values of y ,+1 , ,y v ,t denote by P(y r+1 , • • ,y P ,t) the 
conditional probability that ( 20 ) holds calculated under the assumption that 
m ~ ... ss 7 j, = 0 , It is obvious that P(y r +i, • ■ ■ , y P , i) does not depend on 
the unknown parameters u, •• • , , <r. If we denote by W the critical 

region defined by the inequality ( 20 ), we have 

tb f(Qj '' ‘ , 0, t/r+i > > Vp > <r) 

- I co «« 

* * / / -P(2/r+l > > 2/p ) 0pi(2/t+X > “ * * j Z/p j ) ) *?p j 17 ) 

to J—to Jo 


X pi:(/, cr) dy r -\-i ' ‘ 1 dy p dt 


Where , • * • , y P , Vr+1 , ,y P ,a) denotes the joint probability density 

function of y P+1 , ••• ,y p and p 2 (t, a) denotes the probability density function 
of t calculated under the assumption that iji = • • ■ = i) r = 0. In order to 
satisfy the condition of our Theorem, the function ^ in (20) must be chosen so 
that 


/ ■t«° 1*4*0 p* 5 

• • ■ I I P(2/r+l . ’ ’ ' , 1/p , i)pi(Vr+l > - • ‘ ) Vv > I?r+1 > ' ' ’ > Vt I a > 

no J—W JO 

X pi(t, <j) dy r+i • • • dy v dt = 0o(O). 


( 22 ) 


Let 

(23) jf P(2/r+i , ■■■ ,Vp, <0 dt = Q(2/r+i, 1 • ■ » Vp I »)■ 

Then we obtain from (22) 

» 4eo a “ftO 

(24) / • • • / Q(l/r+i, • • ’ , Vp » dy„ = do(0) 


From (24) and Lemma 2 it follows that 

(25) Q(l/r+i , ■ • • , Vp . v) = do(0) 

except perhaps on a set of measure zero. From (23), (25) and a result by P. L. 
Hsu we obtain 

(20) P(j/r+i , • ' ■ , Vp i 0 = 


except perhaps on a set of measure zero. 

It follows easily from (20) that Hv<+i > 
except perhaps on a set, of measure zero. 


. . . ,y p ,t) is equal to a fixed constant 
This proves our Theorem. 


. p. L. Hsu, “Notes on Hotelling’s generalized T,” Annals of Math. Slat , Vol. 9, p. 237 



440 


EDWAHO PAPI-HON* 


A NOTE ON THE ESTIMATION OF SOME MEAN VALUES FOR A 
BIVARIATE DISTRIBUTION 

By Edward PAt-LaoN* 

Columbia Vniwrnity 

In this paper two problem* arc* diseuwed which were suggested by the theory 
of representative Rampling [1], hut which also occur in several other fieldH, The 

Tfl 

first problem is to set. up confidence limits for — , the ratio of the mean values 

of the variates x and y. This cornea up in the following situation. Let a popu¬ 
lation x consist of N units xi , Xs , • • * x* and suppose we wish to set up eonfi- 

s 

Z)x ( 

ilence limits for the mean X *= '~^ r . Also assume the population r ha' ’■ 

divided into M groups, let e> be the number of individuals in the / lh grot, 

Uj be the sum of the values of x for the v, individuals in the j‘ h group, bo X — 

Ul -± Uj - U ' v * , Now if a random sample of a out of the M groups is 

Vi + vt ■ • • v, v Mm, 

taken, yielding observations (u, , e,), (u,, ej) * • • (u„ , t<„) and V is unknown, 
the determination of confidence limits for X clearly becomes a special case of 
the first problem. The distribution of a ratio, discussed by deary [2], does 
not seem to be well adapted for this purpose. 

The second problem, which is of greater practical interest, arises when we 
again have a random sample (ip , vt) • • • (u„ , e„) of n out of M groups and N 
and M are known. The standard estimate of X that has usually been made 

52 W/ 

n 

n observations on v can be used to increase the precision of the estimate of the 
numerator of X, This is a special case, of problem 2, which we can now formu¬ 
late as how to best estimate m x (the mean value of a trait x) both by a point and 
by an interval, when for each unit in the sample observations both on x and 
on a correlated variate y are obtainable, and m v is known a priori. Situations 
of this type occur fairly often. It is possible to reduce the second problem to 

the first by using -‘m,, as the estimate of m t , and by multiplying the confidence 

7ft 

limits for — by m v to secure limits for m,, but this will not usually be the most 

771 y 

efficient procedure, 

In both problems two cases will be distinguished: (a) when < t\ , <r\ and p are 
known a priori, and (b) when they are unknown. To determine confidence 


is 


X 


M -a , , 

• -- , where U 
N 


This estimate does not utilize the fact that the 


* Work done under a grant-in-aid from the Carnegie Corporation of New York. 



NOTE ON ESTIMATION 


441 


TfX 

limits for —* , it will first be assumed that the probability density f(xy) of 
m v 

and y is 
/(x, y) 

r 


(l.i) 


... f 1 Y 0 (x~m^(y-tn^\ j_(y-m v \X 

P L: 2(i-»iv^ rl ~H;vrA~rv~) II 

2irtr x a v \/l — P 2 


Denote the ratio — by K (assuming m„ ^ 0), and suppose it is desired to test 

Tfty 

the hypothesis that K — K 0 on the basis of a sample of n independent observa¬ 
tions (£l j J/i) • ‘ - (*" i J/n) ■ 

n 

Zi x , — Ky> and 5 = ■ Since 2 is a linear function of x and y it 

72» 

must be normally distributed, and its mean value is obviously zero. Therefore 

v v» 2 _ V w (g - Ky ) 

(1.2) u /“j ^ T V 2 "2 

V U z — 2Kp0x -p XL (T v 

will be normally distributed about zero with unit variance, and the hypothesis 
is rejected if | u(K 0 ) | > u„, where -j= e~ i,! dt = \a. It is easy to show 

that this test is equivalent to that based on the likelihood-ratio. 

Confidence limits for K would now be given by values of K satisfying the 


inequality 


■y/ni 

<r. 


< Ua ( provided they always constituted a closed non-empty 

interval. This is equivalent here to the requirement that K be a real valued 
monotonic function of u in the interval - °° < u < 00 J this requirement is 
unfortunately never exactly fulfilled, as can be seen from the graph of (1.2) 

, _ .Vny 

(in the u, K plane), for the curve has two horizontal asymptotes u - ± , 

and one maximum or minimum point (unless - = p -fj. However, K will 
always be ft monotonic function of u in the interval -u a < u < u a provided 
Vn ? > u„ . Since m„ & 0, by taking n sufficiently large the probability 

VnJ I < Ua can be made arbitrarily small. Moreover, for values of a 

\ , m„ will be such that 

ordinarily used, in most practical problems the value of — 


that 


even for quite small samples the probability 




< u a (that is, the proba- 



442 


KDWARD I'AUI.SON 


bility of getting a sumple for which the values of K that are accepted will not 
form a real interval) will be quite negligible. For example, let a have the 

m„ 
cr„ 

lb, Prob. 

Subject to these rather weak restrictions cm the 1 order of magnitude of n and 


conventional value ,05, and suppose 
V ' n 5 I < 1 ,9g\ < 10 4 and for n 


— 2; then for n 
\/ n y 


9, Prob. 

< i.%y < 10~’ J . 


m u 

*11 


, the confidence limits for K are 


(1.3) 


(nxy — u\pa x e v ) ± V / fn.n/ — u\ pe x e u Y' — ( nfj 1 — u* ov)(jir* — u\ e\) 


SI 2 2 

ny — u a cTy 


In case (b) when el, e\, and p are unknown, each z, = x, ~ Ky, is still 
normally and independently distributed with zero mean and a common variance. 
It. follows that 


(1.4) 


i 


V'n-f 


\/n (.? — Ky) 

Vs* ~ 2m x «„ K + 4 Iv~ 


will have! Students’ distribution with n — 1 degrees of freedom. Subject, to 
practically the same restriction as before, the conlidence limits for K as deter¬ 
mined from (1.4) are 


(1.5) 


(nXy — t^rsxSy) ± V (nxy — — (mf — t? a xl)(n.f 


2 


ny — t a s. 



where t a is the critical value of Students’ distribution (for n — 1 degrees of 


freedom) and s* = -j -, 4 = > anC ^ r the sample correlation 


between x and y. 

When the distribution of x and y deviates considerably from a bivariate 
normal one, it would still appear that as a practical matter much the same 
methods could he used. The basis for this is the fact that there is considerable 
experimental evidence [3], [4] to show that the distribution of the mean of a 
sample drawn from any population likely to be encountered in practise will 
approach normality very rapidly even for n quilt 1 small. Hence 2 and u can be 
regarded as normally distributed for n say >25, and the confidence limits for 
m 

- 1 will then be given by (1.3); in ease (b) a somewhat larger sample is required 
m v 

to dimmish the error m estimating a z , But for n say > 50, i will have a distri¬ 
bution close to normal and the confidence limits for K are given by (1.5) (with 
t a replaced by u a ) The statements for the non-normal case appear as a prac¬ 
tical matter to also hold when the sample is drawn from a finite population of IV 



NOTH ON ESTIMATION 


443 


units without replacement if A 7 n is not too small, provided n is replaced by 

» (Jr- ,0 ■ tor '<*-*" “ ; (v^i) ^ - w,k + « 

In the second problem we again start by assuming the distribution of x and y is 
given by (1.1). I'or case (a), m x is the only unknown parameter If P = 

“ A Ti 


ri/(r,V. I m *) <t> - 8 if* 1 , then 

,«.i dm* 


dm x 

1 ( 2 ^(x, m x ) 2 p 


2(1 p") f <ri <r x (r v 

and the maximum likelihood estimate rh\ of m z is 


2(y, - m y )\, 


(1.6) 


Ai = x - ^ (y - my), 

<T V 


where <Ti V — pa z <T y . Also riii is a sufficient statistic, and the confidence interval 
given by the set of values of m x satisfying 



will be a "‘shortest unbiased confidence interval” in the sense of Neyman. 

(fast! (b) will be more important, since the exact values of the variances and 
covariance will usually lx 1 unknown, By analogy with (1 6), a similar estimate 
of m x for this case is 


(1.7) rhi = x — ^ (y — m v ). 

s v 

This is precisely the least square estimate of x, corresponding to y x = m „, and 
has been used for this problem before; for example, it. is discussed by Cochran [5]. 
We shall discuss some additional aspects of the problem, and also mention the 
application to the special case of representative sampling by groups. 

When the bivaiiate distribution of x and y is such that the conditional distri¬ 
bution of each .r, is noimal with mean A + By { and a common variance, then 
Professor Wald has suggested that exact confidence limits for m* for small 
samples etui be secured by using the standard methods of the theory of least 
squares. The resulting confidence limits are easily seen to be 


where 


rhi ± - j n -r- a/X, 

' Vn — 2 


M 2 

?T 

1 

Hi 

to 

_1 

~1 , (m y - yf 

/vi ' n 

L.-i J 

_ E (i/. - y)\ 


X = (1 — r 2 ) 



444 


KDWAUD PAULSON 


and t a in the critical value of Students’ distribution with n ~ 2 degrees of 
freedom at a level of significance =* a. 

The requirement that the regression of x on y he linear is rather stringent, 
although it may often he fulfilled, especially in the case of representative sampling 
mentioned in the opening paragraph. When the regression of x on y is non¬ 
linear, the estimate given by (1.7) requires some further justification. Lot 
U„ sn E(x'y‘), where E denotes the mean value, and assume that we have n 
independent pairs of observations and that the moments fun, Un , Un , Un , 
Uoi, U* o, Ua 4 and Un are, all finite. It then follows from a theorem of Doob [0] 
that V n(rh 2 — m x ) tends to a limiting distribution with increasing n which is 
normal with zero mean and variance equal to cr 2 (l — p 2 ). 

The estimate J5 is clearly always less efficient than ih 2 unless p = 0. The 

X 

estimate - ■ m y is .known to have a large sample variance 


V - - 1 


•x\ , ( WxV 2 

- ) Vjy +1 ff, 

ft 

L \wi 

J \mj J 


bo - • is always less efficient than rh 3 unless w, — ? m „, at which point V 

y vi 

attains its minimum value. - - --- --. In fact ihs can he cosily shown to have 

n 

an efficiency > any other statistic of the class Q, ^whieh includes ,E and ~ m u j 

consisting of all statistics q satisfying two conditions: (1) that \/ n(q — m,) 
have a distribution approaching normality with zero mean and finite variance 
o- 2 and (2) <r 2 be independent of the joint density function of x and y, involving 
only certain of the moments u,-,. A rather artificial member of the class Q is 

q ~ f—— — -* (\/§ — -\Zm y ). The proof consists merely in observing that 

log Vty 8y 

if for any bivariate distribution 4,, = 4(1 — p 2 ) > a\, this would also have to 
be true when the distribution of x and y is a bivariate normal one, which is 
impossible, since 4(1 — p 2 ) is then the variance of V" n(thi — m x ), 7 hi being 
the maximum likelihood statistic. 

For moderate values of n, say n > 100, fairly exact confidence limits for m, 

will be given by rfi 2 zb s 2 (l — r s ). When the sample is drawn from a 

s/n 

finite population of N units without replacement, the confidence limits for 

« > 100 are m 2 ± ™ VsI(T-7'4. 

In the problem of estimating m, => X for the population n, discussed in the 
opening paragraph, which consists of N individuals divided into M groups, on 
the basis of a random sample (u t , vi), (u 2 , v 2 ) ■ ■ • (u u , v„) of n out of the N 



SIGNIFICANCE LEVELS 


445 


M 


groups, an efficient estimate will hem' = 


u — 


v — 


N 

M. 


Mu 


N 


The efficiency 


of m' relative to the conventional estimate is (1 — p „„) 


would seem to he quite large, 


, which ordinarily 
This is easily extended to the case n is divided 


into l strata with M, groups comprising A\- individuals in the i th stratum, when 
a random sample of m> out of the M, groups m each stratum is taken. Let v„ 
be the number of individuals in the j th group of the t lh stratum and u { , denote 
the sum of the. values of x for these v„ individuals The estimate of m x becomes 


m — 


Eif, 

Hi - -T‘ U - 

<*»i 

L si, \ MJ_ 


N 


l 

If 22 m, = m is fixed, the large sample variance of m" will be a minimum if m, 

l«-I 

is proportional to il/ftr,,, \/ 1 — p], where p< is the correlation between u and v 
in the t lh stratum. 

In conclusion, the writer wishes to thank Professor A. Wald for his advice 
and encouragement, and Mr. Henry Goldberg for several suggestions. 


REFERENCES 

[1] J. Nevman, “On the two different aspects of the representative method,” Journal of 

the Iloyal Statistical Society, Vol. 07 (1934), pp. 558-606. 

[2] R. C\ Gear*, “The frequency distribution of the quotient of two normal variates," 

Roy. Seal. Soc. Jour., Vol. 93 (1930), pp. 442-446. 

[3] W. A. Shewjiart, Economic Control of Quality of Manufactured Product, New York, 

(1931), pp. 182-183. 

[4] II. G. Carver, “Fundamentals in the theory of sampling,” Annals of Math. Stat., 

Vol. 1 (1930), pp. 110-112, 

[6] W. G. Cochran, “The use of the analysis of variance in enumeration by sampling,” 
Jour. Amer. Stat. Assoc., Vol, 34 (1939), pp. 492-510 
[6] J. L. Doob, “The limiting distribution of certain statistics,” Annals of Moth Stat, 
Vol. 6 (1935), p. 166. 


SIGNIFICANCE LEVELS FOR THE RATIO OF THE MEAN SQUARE 
SUCCESSIVE DIFFERENCE TO THE VARIANCE 

By B. I, Hart 

Ballistic Research Laboratory, Aberdeen Proving Ground 

For purposes of practical application in connection with significance tests a 
tabulation of the argument corresponding to certain percentage points of the 
probability integral is usually more convenient than that of the probability 
integral for equal intervals of the argument. A table of probabilities for the 



Values of for Different levels of Significance 

Yal ttts oj k 1 Values efk Va lues oj k 1 


SP 

O to Cl »C Tt 
^ IT ci l" 
cl o c4 vo 
»-* -« O C3 CO 

cn hm 

?1 p >o 

32 = 

■ 0 * -/D 

to ar to ;o *\ 

to -* ro co 

t'r v_^ 30 

Cl 'p C1 

3 

§§ 

M ~ r. l / 

Qs 

to co co co co 

so cc CO 

CO Cl 

ci ci ci ~i °i 

Cl 71 Cl Cl ci 

$ 

n 

O CO r-t -J« 

to Cl cc lO Cl 

V 

?1 S3 - 2 ". 

>•. Z r- 

:;s 

55f= 

t- 1 - 

S S c£ in > 
to io -r 9 ^ ci 

1 - tn 1 - r- 1 * 

ic 1 ? c n r 

l- 1 - t- tb to 

*. 

ci ci ci ci ci 

Cl rt l Cl 

ci ci 

ci ci ci ci ci 

Cl 71 Cl Cl Cl 

« 

SO VC Cl CO 

cc i> to to to 

VQ -t CO Cl V* 
to to to to to 

2 to to 

VQ vO 

CO N* "*V “C N* 

IQ 10 IQ to *0 

».4 t'o io iO ^ 


ci ci ci ci ci 

ci ci ci 

ci ci 

ci ci ci ci ci 

ci ci ci ci <"'i 

S3 

fi 

to In ID •*-* ■O’ 

tf "/ IG -H 

l> ^ r/ O' C 

-f -r vo 

gsg 

J75 we to 

vo tc5 

fegasa 

!S ^ y as fi 

1 ? £ is fe 

ft, 

r*H rH i—t H H 


rnrH 

rH rH »H rH 

r —1 »h r —1 r-H rH 

s 

SI 

C3 CO t** •* < Cl 

to I’- to to *0 

ClQCN SC 
Cl Cl Cl Cl Cl 

5S3 

O O >H 

Cl CC CO 

m 

CO cc 


>C Cl tO ». 

-iS 3 y y? s 

•cc co SO CO CO 


r— < j ^ »"H pH 

tH rH |H 

rH t-H 

rH rH «HI H rH 

rH ,H rH ,H rH 

i 

h 

3 

u 

to »OOpO« 
rH -f to EC © 

oisss 

fci i! S ^ 

NMCCh 
O OO rH pH 

a&fesr: 

c \ co ~r ~r 10 

r~» N- c- ~c 

OMCNC 
CO N- X X 03 

ftt 

H l—1 rH tH rH 

rH rH 


^ rH rH rH rH 

_ _ « ^ 

c 

rH ClICC *■£ »0 
CO COCO CO CO 

c§ m cS 

cr. o 
co -r 

rH Cl CO "*C lO 
ffOT'tf 

ONXOQ 
*r -t* to 


3! 


cc sc t- 

CC IQ t - 

O SO 5* ? (Fl 
^ Z uC M w 

ci ci ci 

ci ci ci ci ci 

ci to 10 

fell 

irt N- Q f 
Cl lOOUJp 

M2 'T ^ ^ ci 
<0 to to to 0 

c-i ci ci 

ci ci ci ci ci 

3*2 

®t* “1* 

rmm 

M* ■'C ‘•'-r t -r 

ci ci ci 

Cl ci Cl Cl Cl 


gfeSjfeg 

?J M «M W 






_ N tt5 W i O - 
II -trtM C73O0ts.c 


Stt3|5f 




Cl 

!g 

"T 

:s 

"T 

g 

T—1 

5^ 

?SS 

lO Cl 
. **? 

1 

3 

8 

35 

25 

8 

22 

|H r-H 

rH 

! CO 

CO 

CO 

cc 

eo co 

CO 

CO 

CO 

CO 

co 

CO CO 

CO 

! 7C 

1 'ft 

fe 

s 

r-H 

lO 


Cl 

to 

g 

cn 

70 

& 

CO tn 

-r Ih 

to 

r-H 

) H 

1 Cl 

IN 

-~r 

rH 


%% 

3 

O 

S 

cn 

to 

CO M 

07 cn 

s 

1 CO 

CO 

CO 

CO 

CO CO 

CO 

CO 

<M 

Cl 

ci 

<M Cl 

Cl 


a &!$S 2 ggf£S 2 gsass p:E;S|2^ S&3S8 S£JS^S 

S 3 C- rO In -ft '* tQ 07 CO OG CO dj «2 £3 CP to -V «C5MO’t‘ <?5 »h 07 «5 r- 

fOW »-h -h O O 03 CnoOCOOTD OC t- IN N- C- h.NOtD<C 

^ -»t co co co co co co eococoww ci ci ci cm cl ci ci ci cl ci cm ci cl d ci 


s SSSSSSi 
II 3SSS22; 


SSfSS? 3 


3 SJ 

II £2 r^^PrJ 4>; 


) to to t-c- t- c 


\Zti 

5 to c 

3 ©5 5 


iilSi 

J cc cc eo co 


'f-MlCtD- 
W IQ 3 t£ i 

s^gss 


gS23;£ 

35 sp c ^h ci co * 0 * vc o t© 
~r -rr -rr' Tf 'ff 




u ^ 8 c§ co o fcp co 
ii t> io -~v -rf -t< *o 

.. 


Ol CO »-H NCI QC1 


m, 

to to 


agisl 

*5 ^ iQOt-COOO r-1 ci CO ^ »n ON CflQO .—( Cl CO -f vO *Q N- QQ O* O 
«—* HHHHH 1—1 »-h r—i ,—< Cl Cl Cl Cl oi Cl Cl CN Cl Cl CO 


3 »0 *0 GO ‘C 


446 






CORRECTION 


447 


ratio of the moan Mpiare siieee.-*ive difference fi 2 to the variance s, P[ - < jfc] = 


1 


w(fj£) rtifijif), where laib'/s) is the distribution of 5 2 /«7 has been published 

recently 5 with k as argument. The following table of values of 5 z /s 2 for P = 
.001, .01 and .05 has been computed from it by interpolation. 

Since the distribution of i 2 /7, w(5 2 /s 2 ), is symmetric 3 about 
Ptf A 3 < k) - P(6 3 // > k f ) if P(S'Vs J ) - jfe ® fc' - S(5 2 /s 2 ), whore E(a 2 /s 2 ) = 
2n/(n - l). 3 The upper levels are rarely of practical use, since large values 
of the rat io, could arise only from a somewhat artificial set of observations, 
such as alternately high and low values of the observed variable. 

The computation of this table of significance levels was made at the sugges¬ 
tion of Lt. C'ol. L E. Simon. 


1 For determination, of «OW) cf. John von Neumann, "Distribution of the ratio of the 
mean square KUreeHrin* difference to the variance, 1 ’ Annals of Math Stal , Vol 12 (1941), 
pp. 3(17 305. 

s B I. Hart, "Tabulation of the probabilities for the.atio of the mean square successive 
difference to the variance,” Annals of Math, tilal,, Vol, 13 (1942) p. 213. 

1 Die, eil 1 p, 372 for proof of symmetry and evaluation of E{ i s /$ ! ) 


A CORRECTION 

By M. A. Oikbhigk 

U. S. Department of Agriculture, Washington 

In my article "Notes on the Distribution of Roots of a Polynomial with 
Random Complex Coefficients" which appeared in the June 1942 issue of the 

n n 

Annals of Mathematical Statistics, the symbol £ 2 m formulas (13), (14), 

n n 

and (15) should be replaced by II II . 

,>“1 j-jj+i 



REPORT OF THE POUGHKEEPSIE MEETING OF THE INSTITUTE 

The Fifth Summer Meeting of the Institute of Mathematical Statistics was 
held at Vassar College, Tuesday and Wednesday, September 8-9, 1942, in 
conjunction with the meetings of the American Mathematical Society and the 
Mathematical Association of America. The following fifty-eight members of 
the Institute attended the, meeting; 

K. J. Arnold, L. A. Aroian, K. J. Arrow, Walter Bartky, Felix Bernstein, C. I. Bliss, 
A. H. Bowker, J. H, Bushey, Belle Calderon, B. II. ('amp, A. Cohen, Jr., A. II Copolnnd, 
C. C. Craig, J H. Curtiss, W, E. Doming, J, L. Doob, M. L. Elvcback, Willy Feller, M. M. 
Flood, It. M. Foster, H. A. Freeman, T. N. E, Greville, C. C. Grove, E. J. Gumbcl, Edward 
Helly, G. M. Hopper, Harold Hotelling, Dunham Jackson, R. E. Jolliffe, Irving Kaplansky, 
Karl KarBtcn, B F. Kimball, Howard Levene, Eugene Lukacs, II. B. Mann, E. B. Mode, 
E. C. Molina, F. C. Mosteller, 0. It. Mummery, M, L Norden, E. G. Olds, Oystein Ore, 
Edward Paulson, Selby Robinson, F. E. Sattcrthwnitc, Henry SehefK, L. E. Simon, Morti¬ 
mer Spiegelmnn, Arthur Stein, J. R. Tomlinson, A W. Tucker, J. W. Tukey, D. F. Votaw, 
Jr , Abraham Wald, 8. S. Wilks, E. W. Wilson, Jacob Wolfowitz, L, 0. Young. 

The opening session, on Tuesday afternoon, was devoted to contributed 
papers on Probability and Statistics and was held jointly with the American 
Mathematical Society. The Chairman was Professor Cecil C. Craig, Uni¬ 
versity of Michigan, and the following papers were presented: 

1. On the Theory of feeling Composite Hypotheses With One Constraint. 

Henry Schoffd, Princeton University. 

2. On the Consistency of a Class of Non-parametric Statistics, 

Jacob WolfowiU, Staten Island, N, Y. 

3. Graphical Controls Based on Serial Numbers. 

E. J. Gumbel, New School for Social Research, 

4. Significance Tests for Multivariate Distributions. 

D. S. Villars, United States Rubber Company. (Introduced by E. G. Olds.) 

5. On the Choice of the Number of Class Intervals in the Application of the Chi-square 

Test. 

II. B. Mann and Abraham Wald, Columbia University. 

6 Oeneralized Poisson Distribution. 

F. E. Sattcrthwaite, Aetna Life Insurance Company, 

V. The Relationship of Fisher’s Z Distribution to Student’s T Distribution. 

Leo A. Aroian, Hunter College, 

8. On a Statistical Problem Arising in the Classification of an Individual In One of Two 

Groups. 

Abraham Wald, Columbia University. 

0. Modern Statistical Methods in Penology. 

Saly II. R, Strulk, Radoliflc College. 

Miriam van Waters, Framingham, Mass. 

10. Regularity of Label-sequences Under Configuration Transformations. 

T. N. E. Greville, Bureau of the Census. 

By Title: 

On the Ratio of the Variances of Two Normal Populations. 

Henry Scheff6, Princeton University. 

Abstracts of these papers follow this report. 

448 



MKETINO 


449 


On Wedm-dav morning Profis-mr Harold Hotelling, Columbia University, 
acted at* Chairman “f a »'« AW/owhc Procrme a. The following papers 

were presented: 

j PerntUruc' nn«f fh*-*rrtnet 

A H f«tj*rlan4, T MV*r»i»y «f Mud,mat* 

2 fvSfiitoma and franr Ptarhcnl Application* 

Willy Keller, Vn»v*>r*nly 

3. (itncrol Theory o*vl .tmehrsOonis h< I'hyi’irs. 
j t, n.i tb, rmvrroily ><! nii*en« 

The Hcwaun on Wednesday afternoon was held jointly with the American 
Mathematical H*teddy. Ut. C ‘nl. la-eli* 1 Iv Simon, U. S. A., served as Chairman, 
and the following papers on Tb Applicability of Mathematical Statistics to War- 
Efforts were presented: 

t SlaJitUc A l‘urhelu>n . W'llfc .Special lirferenre to the Problem of Tolerance Limits 
g ** Willkx, rnn<rH»tn University 

//,„ i5WH f J H t'oriixi. tVonrll rnivmity. 

2 (in Ikt Salute of SfnUhenuttit'ti StuUfUm in Quality Control. 

W K«t««r>f» IVonn*. Hofeait *4 ihc {’ensu*. 

Ihtnrittinl Walter Parity, University ttf Uhkagtt. 

A meeting of the Board of I hire ton* wan held on Tuesday evening. Following 
the joint dinner on Wednesday evening, a concert was given in Skinner Hall by 
members of the music department of Vaasa r College. 

Edwin G. Odds, 

Secretary 



ABSTRACTS OF PAPERS 

(Presented on Scpteniliei H, 1042, at the Putiglikeepsu; meefinp; of the Institute) 

On the Theory of Testing Composite Hypotheses with One Constraint. Hunky 
ScHKFFfi, Princeton Pnivereity. 

A composite hypothesis with mu- constraint spi-rifies tlm value of nun and only one 
parameter of a set occurring in a distribution function. The I henry of testing sueli hypoth¬ 
esis is not only of direct interest for many important problems, but is intimately related 
to Reyman's theory of confidence intervals (Phil Tram. Roy. Hoc. London, 11)37). A 
method of Reyman (Bull. Roe. Math France , 10351 for finding type R regions for testing 
these hypotheses is extended to the ease of any number of nuisance parameters Type lb 
regions are defined by generalizing the type A i regions of Reyman and Pearson (Flat. Res. 
Mem., 1030) to the ease* where nuisance parameters are present, and sufficient conditions 
are found that a type B legion he also of type lb . An interesting moment problem is 
encountered, in which the admissible functions are not of eonstant sign, and is solved for 
the case where the original distribution is multivariate normal 


On the Consistency of a Class of Non-Parametric Statistics. J. Woi.Fowm, 
X. Y. City. 


Let X and }' lie two stochastic variables about whirs* disttibutiou nothing is known 
except that they are continuous and let it lie required to test whether their distribution 
functions are the aame. lad V lie the observed sequence of zeros and ones eonslrueteil as 
described elsewhere (Wald and Wnlfimitz, Annals of Math. Flat., Vol. It tIf) 10), p, I IK), 
Suppose that tin 1 statistic' R(D used (o test the hypothesis is of tin* form At I') - - Ey(/,l, 
where l , ta the length of the j-tit run and ip(x) a suitable function defined for nil positive, 
integral x, The notion of eonaiateucy, originated by Fisher for pummelrie problems, has 
already been extended to the nan-parametric ease tine, eit., p. lfifii The author now 
proves that, subject to reasonable conditions unylx) and statistically unimportant restric¬ 
tions on the alternatives to the null hypothesis, statistics of the type Ad’) are consist cut. 
In particular, a statistic discussed by the author (Annals of Math. Flat. September, 11112) 


and for which v(x) u 


* o 


belongs to the class covered by the theorem, 


Graphical Controls Based on Serial Numbers. K. J. Gt'MBEh, New School 
for Social Research. 

The index m of tlio observed value x„ (m » 1, ‘2, ■ n) is called its serial number. A 

value x of a continuous statistical variable defined by a probability lt'(x) » X is called a 
grade, (c.g. the median fork *=* |), The coordination of serial numbers witii grades furnishes 
two graphical methods for comparing the observations and the theory, namely the oqui- 
probability test based on m ■» n\, and the return periods based on m «• nX + j. 

Starting from the distribution of the mth value, we determine the most probable serial 
number fit « nX + A, where A depends upon tlm distribution. For a symmetries! dis¬ 
tribution, the corrections A for two grades defined by X and 1 — X. are equal in absolute 
value and opposite in sign. Then no correction is needed for the median. For an asym¬ 
metrical distribution, wo calculate the most probable serial number of the mode con¬ 
sidered as an with value. Thus the mode is obtained from the observations through the 
theory. In tins caso the mode is not the most precise mth value 

If m is of the order in, the distribution of the mth value converges towards u normal 

450 



ABSTRACTS 


451 


distribution with an expectation given by m = nlF(x), and a standard deviation s(x), where 
n(x) Vn ~ VB'(x| (I — It’ (?) /IV (x) . By attributing to eachtheoretical value x its stand¬ 
ard deviation, we obtain intervals x± «(x) whicli may be used for the control of the equi- 
probability test, the comparison of the observed step function with the frequency, and the 
comparison of the observed with the theoretical return periods. Besides, the standard 
error of the with value leads to the precision of the determination of a constant obtained 
from a grade. 

Significance Tests for Multivariate Distributions. D, S. Villars, U. S. Rubber 
Company. 

The observed mean of sets of m variates, each normally and independently distributed, 
is distributed around the population mean according to a x 1 distribution with m degrees 
of freedom The sum of squares of deviations of n observed points from the observed mean 
is distributed as * s with m(n — 1) degrees of freedom (not with n - 1) A much more 
powerful test for correlation than that by the correlation coefficient is described, which for 
bivariate distributions, involves comparisons between n — 1 and n — 1 degrees of freedom 
This can fie extended to m — I tests with m variates Distribution of distance between two 
menus ami distribution of fiducial radius is worked out in detail for two variates 


On the Choice of the Number of Class Intervals in the Application of the Chi- 
Square Test. JT. B Manx and A. Wald, Columbia University. 

The distance of two distribution functions is defined aB the l.u.b. of the absolute value 
of the difference between the two cumulative distribution functions. Let C(A) be the class 
of alternatives with distance >A from the null-hypothesis. Let/(A, fc, A) be the g.l.b. 
of the power with respect to alternatives in C( A) of the chi-square test with sample Bize N 
and fc equally probable class intervals. A positive integer k is called best with respect to 
sample, size .V if there exists a A such that/(AT, k, A) ■= i and f (N , fc', A)^< I for every 

positive integer fc'. The authors show that k N - where J c e ** 

is equal to the size of the rritieal region, fulfills approximately the conditions of a best fc 


5 

fc.v 


with A,v » “ 4- as the correspondingvnluc of A. The approximation is shown to be 


satisfactory r'm .V > 150 if the, 5% level of significance is used and for ^ 300 if the 1% 
level i« used. 


Generalized Poisson Distribution. F. E. Satterthwaite, Aetna Life Insur¬ 


ance Company. 

In tlds paper I In* I’oisBon distribution is generalized to allow for the assignment of 
varying weights to a set of events when the number of events follows the Poisson law 
The development.used brings out the fact that distributions falling in this c ' a8S do 
reuuire that the underlying statistics be homogeneous. The only requirement ib that they 
KXnemlent. Formulas are given for the moments of the generalized distribution as 

functions of the moments of the underlying distribution of weights. ie pnncip es o 
he observed in the solution of practical problems arc outlined. 


The Relationship of Fisher's z Distribution to Student’s t Distribution. Leo 

A. Aroian, Hunter College. _ 

, , „r 1 . / N , ia distributed as Student’s t with N de- 

For n, and n, sufficiently largo W = - U z 19 aiawl,JULtiu 



452 


ABBTRACT8 


if 1 

greea of freedom, N »» ni + n» — 1, P 1 » -I — + 

z\ni 

Student’s distribution, the level of significance for z will be 


i). 

tlj/ 


If the level of significance is a for 


pi 




<«. Ab n 

corollary it follows that the distribution of z approaches normality, m, n t ~* », with mean 
zero and variance £( — + - j. This simplifies a previous proof of the author. Application 


ifi + iY 

2\ni ni/ 


of this result 5 b made to finding levels of significance of the z distribution. On tho whole 
R. A. Fisher's formulas for this purpose, ni and nj large, as modified by W. G. Cochran are 
superior. The results given hy the FiBchor-Cochran formulae are compared with those 
ohtained by using the formula recently found by E. Paulson. 


On a Statistical Problem Arising in the Classification of an Individual in One of 
Two Groups. Abraham Wald, Columbia University. 


Let n and vj be two p-vanate normal populations which have a common covariance 
matrix, A sample of size N, is drawn from the population n(i » 1, 2). Denote by x,„ 
the a-th observation on the ith variate in vi , and by y,/i the /9th observation on the tth 
variate in in . Let z;(i - 1, ■ • ■ , p) be a single observation on the ith variate drawn from a 
population t where it is known that *■ is equal either to m or to n . The parameters of 
the populations m and r t are assumed to he unknown. It is shown that for testing the 
hypothesis ir • n a proper critical region is given by U ;> d where V ■» 2S*'9z,(#/ - *,), 

11 *<« II “ II •1/11“*, HI “ [£ J(i,.-*,)(*/.-*,) + £ J(v,0 - t,)Uh, -f/>]/(iVi + tf.-2), 

Si - Jx lt ,)/Ni , iji m (22 Jy<f) /N't and d is a constant. The large sample distribution 


of U is derived and it is shown that U is a simple function of three angles in the sample 
space whose exact joint sampling distribution is derived. 


Modem Statistical Methods in Penology. Saly R. R. Struik, Radcliffe 
College and Miriam van Waters, Massachusetts Reformatory for Women. 

In applying Btatistioal methods to penological problems, bo far the host known studies 
have considered 100, 600, or once In England (to refute Lombroso’a theory) 1500 cases. 
But from the correct statistical standpoint, far more cases are needed to establish a law. 
Over a period of years, an attempt has been made to use statistical methods in the study 
of penological psoyems in the Massachusetts Reformatory for Women, but the results 
will takahta real sigdificanoe and be conclusive only when similar investigations are mado 
all ovey/the TJpitei^ Stages. 

, * i ^ 

Regularity oPDabjbl-Sequencea Under Configuration Transformations, T. N. E. 
Greville), Bureau of the Census. 

There is developed a class of transformations on sequences of arbitrary labels in terms 
of which a wide variety of probloms in the theory of probability can be formulated. It is 
shown that, with mild restrictions on tho transformations used and on the measure function 
assumed on the label-space, almost every label-sequence produces a transform having tho 
frequency distribution oxpectod, Tho class of transformations considered is shown to 
include as special cases the four fundamental operations of von Mises: place selection, 
partition, mixing, and combination. 





4 S3 


On the Ratio of the Variances of Two Normal Populations, llr'.m £• hm-**, 
Princeton University. 

Let 0bc Ihcatwivr rail” Th" m *hi* f af*’* « r "*»*■ <“rm*ikii., n 

l COB 1 r m riwm of tit *ig,niftr&nr>" wt* i 7 *,*,<» l,v } .>! • •••." * *" p < **"* *■ 

" 'tervaJ* tor 9. Thrp»)*rr 1 * 4’Vi<5^4 >n5< 1 • »■* pars,* • » r }<(»* s« '«• >■' J "r. * u '’Pn.ir.t*tv b vrl 
arid only anlulintw bawd <«* the ? (p.»< 7 -,l,•,«•,}.. !t „ T< r ,., ( «,,jrrr 1 {-'..Ib-wm* v«ttmi» ap. 

roachce «•* teal * 1 Bt, d r«irfi'»p>»iwlina iki* -4 rr.n m'» ?■*-»t* sir *mr<wb»r»4 It tnrmi 
oulthat the limits on *he WU'-n «hu*. mH »>. nnt-iawt **'*’ th* *W a* Hmw 

which yield confidence ititrrvalu ..ptiwisn an a r* r‘,»>n .niuitc «■ ••'W f t-*' **’nl*K» *»f Uwjw 
limit* fire difficult to compute ®r**l *" m. r.'turwn«-i4 ,j 4 . ri ajr ii-.'K *•< mmU^*** sbr \x^ ,4 
efficiency in using, in*tend t!*« r rwih nV.sitiHI Y.jual tail*" limit# The w*on<| pars qf 
the paper i« concerned with *!.'• rajat*-*** - * «4 ntfimton >t*s >• K* !r =il r*-|tt"-na «t»4 Sype ff, 
region® Mid the application *'f KnyfnswjY 1>*r.-r * *.{ <■< sr/efvals N«t he® trait® nr 

confidence intervale not already c.<r„®/,d<-rrd S r par* J ate nb'MiKd, but lfc<w ptrvinuaty 
judged licet of a very narrow r)nm are- ret® *V< ■«» im) ** in' <4 ail tliow t ***”4 r»n nrnslar 
regions of the came xitr. 



THE ANNALS 

OK MATHEMATICAL , STATISTICS 


c»m ii my 

a a wxlks, maw 

A. T. CRAKS , J. REYMAN 


B. €. Ca»ve« 
H. ClUJiftu 
W. E, Praams 

< 5 . Dabmous 


WWW THE mOPEKAYWMS OK 

U, A. Fraor* 

T. C. Fitv 
H. HoTnuton 


R. YON MtrtEH ' 
K. B. I’gARMOH 
H. I* JtlBTZ 

W. A. Shewoabt 


The Awnai* ok Mathematical Statistics ist jutblisbed quarterly by the 
lUAtlfcqte of MatbwMAtiejd Statistics, Mt. Royal & Guilford Aval, Baltimore, 
Mil. Subscriptions, renewals, orders for back number* arid other business ftom- 
munie&tions should In sent, to tl*** Annals ok Mathematical STATiarica, Mt. 
Royal & Guilford Am. Baltimore, Md., or to the BteJtntery of the Instt- 
■ mm of Matbetaatiral Stetitekft, It CL Okfo, Carnegie Institute of Technology, 
Htteburgb, Rr. Chengs in.malRng .address which are to bMMpe effective for • 
. d given fesue 'should be reporb-d to the Secretary on or before the 15th of the 

■ month ittocrthtag the month of’that' itone. The months'of m\K- are March, 

■ June, September and Iteramber. < 

' ' Manuscript* for pttldkuRon in thn Amim m Mathkmatioai. RrATomca 
1 " 1 ^toidd'bo sent ho 8, S. Wilke, 'FSwe Bn»,' Rrineetoa, New Jersey. Mamiseripte 
. nhouldbe Rewritten dodl^pRMd with wide margin*, tend tt# original copy 
f abouldhe submitted;' footnote® should' be reduced to A minimum and whenever 
possible replaced by a bibliography at the end of the paper; formulae in foot¬ 
notes should ho avoided. Figures, charts, and diagrams should be drawn on 
plain white paper or tracing cbth in black India ink twice theaiae they am to 
be printed. Authors ate requested to keep ip mind typographical difficulties 
of complicated mathematical formulae. 

Authons will i»di«a*tiy receive only galley proofs. fifty reprint* without 
covers win bofurniiihed free, Additional reprints nod covers tomfehed at coat. 

, The Bulwription price for the Annami ia $5.00 per your, Single copies $1.50. 
Baek numbers are available jgtjBMW per Volume, out fUQ per single hswe. 



-I ' : Comwkmsi) ajto Pbikteb a* tmh ■’ 

/Vv/v ■./* ,/ wavbbi,y mm *,*** 

>' g #rangigraera i %i! » 8 ?^ ? tewtonrteCCTraagg raii gms ! in M s iiia gpira aiss3Bi 

Kntawi OB remnd^taeB mottw Bt tbo Poai OaiM at Baltimw*. MorplMuI. Ondot tfct> Act ol M woh 3, lari 








